Subject: Re: [jats-list] Why is archiving JATS with a DOI not common? From: "Alexander Schwarzman aschwarzman@xxxxxxxxx" <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Sun, 1 May 2022 02:40:23 -0000 |
Hi Castedo, Let me try to clarify confusing terminology that surrounds the notion of DOI and also comment on the difference between the Accepted Manuscript and the Published Article. The dirty little secret is that DOI -- which, in principle, stands for Digital Object Identifier -- does not identify any digital objects (sigh). Instead, it identifies a set of objects. A DOI does not resolve to any particular representation of the content; instead, a DOI resolves into something called "a response page" or "a landing page", which is a web page that contains links to *some* manifestations of the content in question (more on that later). A little historical digression: in the beginning of electronic publishing, there were efforts to establish an ID that would identify the content, rather than its manifestations. There were also efforts to assign different IDs to different manifestations, e.g., to a PDF, to an HTML, and to an EPUB manifestations, so that different digital objects would get different IDs. Both efforts, unfortunately, have failed, and now the so-called "Digital Object Identifier" identifies a whole bunch of digital objects, e.g., an XML source, its PDF, HTML, and EPUB manifestations, as well as a pre-published manuscript, whose content is different from the final published article. Now, when a DOI is assigned, there are many Registration Agencies <https://www.doi.org/registration_agencies.html> (RAs) that can register the DOI. You can follow the links at the site to find out more. Crossref is just one of many RAs. So, when you ask about resolving a DOI to the XML format of the content in question, this is not, technically speaking, correct: a DOI resolves to a response page, which may or may not provide a link to the XML format of the content you are interested in. The logical question then is to ask: Why don't all publishers provide the XML source of their content? Some publishers do (e.g., PLOS, The Royal Society) while most don't. You see, articles are published under different licences: some are proprietary; and others, even though allowing free sharing, such as CC BY-ND 4.0, allow no derivatives. If all XML were made publicly available, then it would be very easy for unscrupulous actors to create a different manifestation of that XML and publish it, to compile a collection of existing articles, etc. A publisher, especially a non-for-profit or a small one, or a publisher that has subscription journals, simply doesn't have wherewithal or financial resources to police that kind of nefarious activities, not to mention to engage into expensive lawsuits, especially if the culprit is located outside of the publisher country's jurisdiction. And thus, the XML is not always available. For example, I don't believe you can get the XML of your 2007 article https://doi.org/10.1016/j.ic.2006.10.007, that you published with Elsevier in 2007, because it is under the Elsevier user license. You've also asked about the difference between the Accepted Manuscript (AM) (a.k.a. "ahead-of -print manuscript", "author manuscript", etc.) and the final published article. There are differences between the version that was peer-reviewed and scientifically accepted and the final version (Version of Record, VoR). I'll refer you to the list of 102 Things Journal Publishers Do <https://scholarlykitchen.sspnet.org/2018/02/06/focusing-value-102-things-journal-publishers-2018-update/> for the complete list; some of the things worth mentioning in the context of highlighting the differences between the AM and VoR are 34, Copy-editing, proofreading, and styling; 35. Language and substantive editing; 37. Art handling; 39. Layout and composition; 41. XML generation and DTD migration; 44. Tagging; 45. DOI registration; 57. Depositing content and data; and 60. Hosting and archiving; to mention just a few. The publishers add value to the peer-reviewed content, and that is why, in your example, Elsevier requests $25 for the final version of the article. Returning to what I alluded to in the beginning of my message, the same DOI, unfortunately, refers not only to the various manifestations of the Version of Record, but also to the Accepted Manuscript, whose content is different from te VoR. In my opinion, this is a bloody mess, but this is what the educated consumer should be aware of ("buyer beware"). If it is any consolation, at least a preprint (which may or may not become a journal article) has a different DOI. Finally, DOI is not the only identifier out there. PubMed ID is a different identifier. If this is clear as mud, I'm sorry. --Sasha Alexander ('Sasha') Schwarzman Content Technology Architect tel: +1.202.416.1979 aschwarzman@xxxxxxxxxx On Sat, Apr 30, 2022 at 5:06 PM Castedo Ellerman castedo@xxxxxxxxxxx < jats-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > On 4/29/22 07:26, David Haber wrote: > > After reading your question a few times more, are you asking why the > > specific XML component or format of a given article does not have its > > own unique DOI? > > More or less, yes, that is what I was wondering, thank you. > > > So, perhaps a publisher HTML version would have a DOI, maybe the PDF > > would have a DOI, perhaps an ePub would have a DOI, and maybe the XML? > > And all these dois would be unique? > > > > If that is your question, then the reason is that the article is the > > unit of measure in scholarly publishing, and those other versions are > > just that, versions or different formats. The content is not unique to > > the format so therefore would not get a separate doi. It is true that > > different formats may display a piece of an article differently (or > > maybe not at all) but that does not make the format unique because the > > DOI represents the entire published object and all its formats because > > that is the unique piece we as publishers are shepherding to the world. > > I have some clarifications to ask on a few of the terms you've used. I > ask specifically about the DOI 10.1016/j.tpb.2018.03.006. Here are three > ways I can resolve that DOI to three different digital objects: > > 1) Via doi.org I am sent to a web page where Elsevier requests $25 to > view a PDF file. > > 2) In Zotero I can enter the doi and I get a free PDF (which is labeled > Author manuscript) > > 3) I can enter the DOI on PubMed Central and freely see an HTML page > (also labeled Author manuscript) > > I assume 1) resolves to different content than 2) and 3) because > Elsevier wants $25. > > So we have one DOI which is representing two different sets of content > here? Or does the DOI represent only the $25 article and not the author > manuscript? > > What is the unit of measure in scholarly publishing in this case? > > Is the Author manuscript provided by PubMed Central and Zotero part of > the entire published object or not part? > > Is the PubMed Central web page content here not a published object? > > Thank you, > Castedo
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [jats-list] Why is archiving JA, Castedo Ellerman cas | Thread | Re: [jats-list] Why is archiving JA, David Haber dhaber@x |
Re: [jats-list] Why is archiving JA, Castedo Ellerman cas | Date | Re: [jats-list] Why is archiving JA, David Haber dhaber@x |
Month |