Re: [jats-list] citation "year" with suffix

Subject: Re: [jats-list] citation "year" with suffix
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxx>
Date: Wed, 27 Feb 2013 10:22:08 -0500
Hi,

What Alf just said is perfectly right, except he leaves out one detail.

Assume a citation correctly marked up using element-citation, with all
the necessary pieces in place for automated rendering.

Then change it to mixed-citation, adding punctuation for a default rendering.

Your process can still decide to ignore everything in the
mixed-citation but the element content, and give it whatever rendering
you want. (Assuming your markup is as good as it presumably has to be
if you relied on element-citation, that is.)

Indeed, you can implement heuristic logic to determine whether all
elements you need to render a citation cleanly (substituting its own
punctuation with your own) are present, falling back on the
punctuation given if they're not, providing a more graceful failure.

I agree with Nikos in this debate. Information that is missing is by
definition unavailable for use. mixed-citation is not only more
flexible and adaptable to different situations in data acquisition; it
also provides your workflow with escape hatches that are very useful
in a world that is not yet perfect. Indeed, this is a classic case
where the perfect can be the enemy of the good.

element-citation is good if you never had any punctuation, and no good
way to get it, as an indicator that the display processor is on its
own.

Cheers, Wendell


On Wed, Feb 27, 2013 at 6:45 AM, Alf Eaton <eaton.alf@xxxxxxxxx> wrote:
> I think it depends on whether the contents of <*-citation> are being
> treated as a string to which some semantics are added
> (mixed-citation), or as a set of data with no pre-defined rendering
> (element-citation). The former being more useful if you're marking up
> legacy content, and the latter being more desirable if you're creating
> reference lists from scratch.
>
> I'd like to be able to choose how the citations are rendered at
> display time, if at all possible, which basically makes
> element-citation a requirement for our newly-generated content; if
> using mixed-citation, any non-marked-up information would be lost.
>
> If it does turn out that lots of citations are having to be entered as
> plain text in a <comment>, though, then maybe mixed-citation will turn
> out to be more appropriate for those cases...
>
> Alf
>
> On 27 February 2013 10:41, Nikos Markantonatos <nikos@xxxxxxxxxx> wrote:
>> Hi Kaveh,
>>
>>
>>> I agree that for subjects with a lot of legacy citations which cannot
>>> easily be structured, mixed citation might be better, but for STM
>>> research where most refs are structured, I think that it would
>>> encourage more structure.
>>
>> As others have argued already, <element-citation> would simply encourage
>> more tag abuse and hidden encoding problems. <mixed-citation> only encodes
>> whatever metadata is known for a citation and leaves everything else,
>> including spacing and punctuation, unmarked. Clearly, we encourage XML
>> creators to supply citations as fully marked up as possible, but there are
>> practical reasons where this may not be possible.
>>
>>
>>> My preference is that at least for normal journal and book citation
>>> which has structure, the data should be pure, and devoid of textual
>>> embellishments like en-dashes and semi-colons. These can be put in by
>>> any intelligent renderer on the fly, and even produce the most
>>> beautiful typesetting.
>>
>> Textual embellishments may appear to be less crucial than the citation
>> metadata, yet they form an important part of an archival quality XML.
>> Renderers, no matter how intelligent they are, will never be able to
>> reinstate lost information. And this is because lost information is not
>> consistent and there are always good reasons behind that. Over the years and
>> over the course of dozens of legacy content migration projects, we have
>> found monstrous implicit rules hidden behind the idea of maintaining "pure
>> data". We had to apply reverse engineering techniques and inquire with
>> people who had originally encoded the information.
>>
>> Bottom line is that if you employed an intelligent renderer to display that
>> content, it would be a multi-week effort for each case, custom built for
>> each particular subset of content. I cannot believe that this is the idea
>> behind an archival quality XML. Not unless you are willing to associate each
>> complex renderer and store it along with the XML it corresponds to. But
>> clearly, this goes against the original idea of storing all the information
>> pertaining to an article in a single XML file.
>>
>> Tags like <mixed-citation>, <string-name> and <string-date> offer the power
>> to those who care to keep both the article structure and the associated
>> display information in a single XML file. This, in turn, helps keep
>> renderers simple and reusable across a vast variety of content. And I
>> consider this capability to be one of the strongest assets of the NLM/JATS
>> family of DTDs.
>>
>>
>> Best regards,
>> Nikos Markantonatos
>> Atypon
>>
>>
>> On 02/26/2013 05:24 PM, Kaveh Bazargan wrote:
>>>
>>> Hi Nikos
>>>
>>> I bow to the volume of data that you have in your organization, and my
>>> exposure to the variety of xml does not compare to yours. ;-)
>>>
>>> I agree that for subjects with a lot of legacy citations which cannot
>>> easily be structured, mixed citation might be better, but for STM
>>> research where most refs are structured, I think that it would
>>> encourage more structure. I would like us to try hard to make
>>> something structured and only use <comment> if there is no other
>>> option.
>>>
>>> Also I personally think that punctuation does not belong in data
>>> (unless there is no way of structuring). I believe that the Atypon DTD
>>> uses the x tag which makes it easy at least to remove the punctuation,
>>> but for me, putting the punctuation in verbatim in the data goes
>>> against the spirit of structure.
>>>
>>> My preference is that at least for normal journal and book citation
>>> which has structure, the data should be pure, and devoid of textual
>>> embellishments like en-dashes and semi-colons. These can be put in by
>>> any intelligent renderer on the fly, and even produce the most
>>> beautiful typesetting. But know there is plenty of "unintelligent"
>>> rendering engines out there that rely on a helping hand from
>>> typographic niceties peppered in the XML. If this is any part of the
>>> reason for leaving punctuation, then it worries me greatly.
>>>
>>> And just an example of how ridiculous mixed citation can get, here is
>>> an example from a recently published paper:
>>>
>>> <mixed-citation>
>>> US CDC 1990. International notes earthquake disaster: Luzon,
>>> Philippines. Mortality and Morbidity Weekly Report 39(34): 573-577.
>>> </mixed-citation>
>>>
>>> This has arguably _less_ structure than a printed reference. The
>>> latter at least has bold and italic which hint at what each item might
>>> represent. ;-)
>>>
>>
>>
>> --
>> Confidentiality Notice:  This email and any attachments are for the sole use
>> of the intended recipient(s) and contain information that may be
>> confidential and/or legally privileged.  If you have received this email in
>> error, please notify the sender by reply email and delete the message.  Any
>> disclosure, copying, distribution or use of this communication by someone
>> other than the intended recipient is prohibited.
>



--
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^

Current Thread