It was great meeting so many of you screen-to-screen at JATS-Con. Thank
you organizers for organizing!
At the end of the conference I was talking about cryptographic hashes.
In the space of JATS, DOIs and versioning, it seems a technology that is
worth being aware of. I have done a bit of learning about it myself.
Below is a demonstration I think worth sharing.
The IANA keeps track of URI schemes like "http:" and "doi:" [1]. There
is a relatively new one "swh:" which uses a cryptographic hash to
identify a file. For all practical purposes it is a unique identifier of
a specific file. That's a reasonable take-away summary even though it's
a little bit more complicated than that.
Here's the demonstration: I searched the web, found a random JATS XML
file and downloaded it. This JATS XML file looks like it was created by
Mulberry Technologies and eLife is using it for testing. Without using
any central authority I am able to calculate the "swh:" identify (using
open-source software git hash-object). Here's the URI that I calculated:
swh:1:cnt:8a9b5eee3aa7bc13a23be113e2bfa1eef442e009
Anybody can archive this file and anybody can calculate this identifier
from just the file. Now it just so happens that
archive.softwareheritage.org does have a copy of this file. And they
support retrieving files based on this identifier. But they didn't
assign the ID. The ID is just a mathematical computation from the bits.
Since they have a copy of the file, this works:
https://archive.softwareheritage.org/swh:1:cnt:8a9b5eee3aa7bc13a23be113e2bfa1eef442e009
No central authority is required. If anybody has the file, the ID can be
computed. I suspect there are some good uses of this in the jats/doi
space. Especially with regards to specific fixed unchangeable versions
of certain files.
Files can be submitted to softwareheritage.org today, they are archived
and a link like the one above works today. However, this scenario is not
their main focus right now and it is not user-friendly for this specific
scenario right now. But you can see this is a very real well defined
technology.
Best regards,
B B Castedo Ellerman
[1] https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml