[jats-list] Cryptographic hashes as identifiers of files without central authority

Subject: [jats-list] Cryptographic hashes as identifiers of files without central authority
From: "Castedo Ellerman castedo@xxxxxxxxxxx" <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 5 May 2022 02:40:21 -0000
It was great meeting so many of you screen-to-screen at JATS-Con. Thank you organizers for organizing!

At the end of the conference I was talking about cryptographic hashes. In the space of JATS, DOIs and versioning, it seems a technology that is worth being aware of. I have done a bit of learning about it myself. Below is a demonstration I think worth sharing.

The IANA keeps track of URI schemes like "http:" and "doi:" [1]. There is a relatively new one "swh:" which uses a cryptographic hash to identify a file. For all practical purposes it is a unique identifier of a specific file. That's a reasonable take-away summary even though it's a little bit more complicated than that.

Here's the demonstration: I searched the web, found a random JATS XML file and downloaded it. This JATS XML file looks like it was created by Mulberry Technologies and eLife is using it for testing. Without using any central authority I am able to calculate the "swh:" identify (using open-source software git hash-object). Here's the URI that I calculated:

swh:1:cnt:8a9b5eee3aa7bc13a23be113e2bfa1eef442e009

Anybody can archive this file and anybody can calculate this identifier from just the file. Now it just so happens that archive.softwareheritage.org does have a copy of this file. And they support retrieving files based on this identifier. But they didn't assign the ID. The ID is just a mathematical computation from the bits. Since they have a copy of the file, this works:

https://archive.softwareheritage.org/swh:1:cnt:8a9b5eee3aa7bc13a23be113e2bfa1eef442e009

No central authority is required. If anybody has the file, the ID can be computed. I suspect there are some good uses of this in the jats/doi space. Especially with regards to specific fixed unchangeable versions of certain files.

Files can be submitted to softwareheritage.org today, they are archived and a link like the one above works today. However, this scenario is not their main focus right now and it is not user-friendly for this specific scenario right now. But you can see this is a very real well defined technology.

Best regards,
B B  Castedo Ellerman

[1] https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml

Current Thread