Re: [jats-list] Cryptographic hashes as identifiers of files without central authority

Subject: Re: [jats-list] Cryptographic hashes as identifiers of files without central authority
From: "Frederick Atherden f.atherden@xxxxxxxxxxxxxxxxx" <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 5 May 2022 07:36:53 -0000
Hi Castedo,

I suspect there are some good uses of this in the jats/doi
> space. Especially with regards to specific fixed unchangeable versions
> of certain files.
>

Since 2020, eLife has been archiving all author-produced code referenced
within articles at Software Heritage. We're using these hashes (a software
heritage id - SWHID) to point to a specific revision of an archived repo,
but they could be used to point to a directory in general, or right down to
a specific line of code from a certain commit.

These are also referenced in some existing JATS4R recommendations
<https://jats4r.org/software-citations#example-1a-pub-id-is-software-heritage-identifier>
for
software references.

All the best,
Fred

On Thu, May 5, 2022 at 3:40 AM Castedo Ellerman castedo@xxxxxxxxxxx <
jats-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> It was great meeting so many of you screen-to-screen at JATS-Con. Thank
> you organizers for organizing!
>
> At the end of the conference I was talking about cryptographic hashes.
> In the space of JATS, DOIs and versioning, it seems a technology that is
> worth being aware of. I have done a bit of learning about it myself.
> Below is a demonstration I think worth sharing.
>
> The IANA keeps track of URI schemes like "http:" and "doi:" [1]. There
> is a relatively new one "swh:" which uses a cryptographic hash to
> identify a file. For all practical purposes it is a unique identifier of
> a specific file. That's a reasonable take-away summary even though it's
> a little bit more complicated than that.
>
> Here's the demonstration: I searched the web, found a random JATS XML
> file and downloaded it. This JATS XML file looks like it was created by
> Mulberry Technologies and eLife is using it for testing. Without using
> any central authority I am able to calculate the "swh:" identify (using
> open-source software git hash-object). Here's the URI that I calculated:
>
> swh:1:cnt:8a9b5eee3aa7bc13a23be113e2bfa1eef442e009
>
> Anybody can archive this file and anybody can calculate this identifier
> from just the file. Now it just so happens that
> archive.softwareheritage.org does have a copy of this file. And they
> support retrieving files based on this identifier. But they didn't
> assign the ID. The ID is just a mathematical computation from the bits.
> Since they have a copy of the file, this works:
>
>
> https://archive.softwareheritage.org/swh:1:cnt:8a9b5eee3aa7bc13a23be113e2bfa1eef442e009
>
> No central authority is required. If anybody has the file, the ID can be
> computed. I suspect there are some good uses of this in the jats/doi
> space. Especially with regards to specific fixed unchangeable versions
> of certain files.
>
> Files can be submitted to softwareheritage.org today, they are archived
> and a link like the one above works today. However, this scenario is not
> their main focus right now and it is not user-friendly for this specific
> scenario right now. But you can see this is a very real well defined
> technology.
>
> Best regards,
>     Castedo Ellerman
>
> [1] https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml
> 
>
>

-- 

Frederick Atherden

Head of Production Operations

-- 



elifesciences.org <https://elifesciences.org>



eLife Sciences 
Publications, Ltd is a limited liability non-profit non-stock corporation 
incorporated in the State of Delaware, USA, with company number 5030732, 
and is registered in the UK with company number FC030576 and branch number 
BR015634 at the address Westbrook Centre, Milton Road, Cambridge, CB4 1YG.

Current Thread