Re: [jats-list] citation "year" with suffix

Subject: Re: [jats-list] citation "year" with suffix
From: Nikos Markantonatos <nikos@xxxxxxxxxx>
Date: Wed, 27 Feb 2013 12:41:33 +0200
Hi Kaveh,

> I agree that for subjects with a lot of legacy citations which cannot
> easily be structured, mixed citation might be better, but for STM
> research where most refs are structured, I think that it would
> encourage more structure.

As others have argued already, <element-citation> would simply encourage more tag abuse and hidden encoding problems. <mixed-citation> only encodes whatever metadata is known for a citation and leaves everything else, including spacing and punctuation, unmarked. Clearly, we encourage XML creators to supply citations as fully marked up as possible, but there are practical reasons where this may not be possible.

> My preference is that at least for normal journal and book citation
> which has structure, the data should be pure, and devoid of textual
> embellishments like en-dashes and semi-colons. These can be put in by
> any intelligent renderer on the fly, and even produce the most
> beautiful typesetting.

Textual embellishments may appear to be less crucial than the citation metadata, yet they form an important part of an archival quality XML. Renderers, no matter how intelligent they are, will never be able to reinstate lost information. And this is because lost information is not consistent and there are always good reasons behind that. Over the years and over the course of dozens of legacy content migration projects, we have found monstrous implicit rules hidden behind the idea of maintaining "pure data". We had to apply reverse engineering techniques and inquire with people who had originally encoded the information.

Bottom line is that if you employed an intelligent renderer to display that content, it would be a multi-week effort for each case, custom built for each particular subset of content. I cannot believe that this is the idea behind an archival quality XML. Not unless you are willing to associate each complex renderer and store it along with the XML it corresponds to. But clearly, this goes against the original idea of storing all the information pertaining to an article in a single XML file.

Tags like <mixed-citation>, <string-name> and <string-date> offer the power to those who care to keep both the article structure and the associated display information in a single XML file. This, in turn, helps keep renderers simple and reusable across a vast variety of content. And I consider this capability to be one of the strongest assets of the NLM/JATS family of DTDs.

Best regards,
Nikos Markantonatos
Atypon


On 02/26/2013 05:24 PM, Kaveh Bazargan wrote:
Hi Nikos

I bow to the volume of data that you have in your organization, and my
exposure to the variety of xml does not compare to yours. ;-)

I agree that for subjects with a lot of legacy citations which cannot
easily be structured, mixed citation might be better, but for STM
research where most refs are structured, I think that it would
encourage more structure. I would like us to try hard to make
something structured and only use <comment> if there is no other
option.

Also I personally think that punctuation does not belong in data
(unless there is no way of structuring). I believe that the Atypon DTD
uses the x tag which makes it easy at least to remove the punctuation,
but for me, putting the punctuation in verbatim in the data goes
against the spirit of structure.

My preference is that at least for normal journal and book citation
which has structure, the data should be pure, and devoid of textual
embellishments like en-dashes and semi-colons. These can be put in by
any intelligent renderer on the fly, and even produce the most
beautiful typesetting. But know there is plenty of "unintelligent"
rendering engines out there that rely on a helping hand from
typographic niceties peppered in the XML. If this is any part of the
reason for leaving punctuation, then it worries me greatly.

And just an example of how ridiculous mixed citation can get, here is
an example from a recently published paper:

<mixed-citation>
US CDC 1990. International notes earthquake disaster: Luzon,
Philippines. Mortality and Morbidity Weekly Report 39(34): 573-577.
</mixed-citation>

This has arguably _less_ structure than a printed reference. The
latter at least has bold and italic which hint at what each item might
represent. ;-)



--
Confidentiality Notice: This email and any attachments are for the sole use of the intended recipient(s) and contain information that may be confidential and/or legally privileged. If you have received this email in error, please notify the sender by reply email and delete the message. Any disclosure, copying, distribution or use of this communication by someone other than the intended recipient is prohibited.


Current Thread