Re: [jats-list] How/When do you produce a JATS-XML version of you publication within your publication workflow

Subject: Re: [jats-list] How/When do you produce a JATS-XML version of you publication within your publication workflow
From: Kevin Hawkins <kevin.s.hawkins@xxxxxxxxxxxxxxxxxx>
Date: Wed, 24 Oct 2012 22:15:58 -0400
Alf,

On 10/23/12 10:10 AM, Alf Eaton wrote:
On 21 October 2012 17:26, Kevin Hawkins
<kevin.s.hawkins@xxxxxxxxxxxxxxxxxx> wrote:
You've already decided to use OJS, so the first question is whether to bother producing XML at all.

This raises an interesting question, which I was thinking about during JATS-CON. If an article is authored in HTML, it's not entirely straightforward to convert the body of the article to JATS markup. In that case, is it possible to use JATS for the metadata (front and back) of the article, but maintain the body of the article in the original source format?

The source format might be HTML (in this case), PDF (as in the J-STAGE
converter <http://www.ncbi.nlm.nih.gov/books/NBK100490/> which only
extracts metadata so far), an image (from OCR), or even something
else.

An analogous element is Atom's "content" element, which has a "type"
attribute[1] containing the MIME type of the content within the
element.

The JATS "body" element only has a "specific-use" attribute, and the
element definition would have to be extended to allow non-JATS
content, so, instead, maybe a solution is to provide the main content
of the article in a separate file, and link to it within the body
element, like this?:

<article>
   <front>[...]</front>
   <body>
     <media mimetype="text" mime-subtype="html" xlink:href="index.html"
xlink:actuate="onLoad"/>
   </body>
   <back>[...]</back>
</article>

[1] http://tools.ietf.org/html/rfc4287.html#section-4.1.3.1

This is a creative use of JATS, and while not exactly sanctioned in the latest version of the spec, it's interesting to hear that someone is contemplating uses involving incorporating a <body> that is as structured as a JATS <body> typically is. That is, you want to do something like dump in raw OCR or just a facsimile of the document.


This all reminds me very much of the Levels 1 through 3 of the Best Practices for TEI in Libraries ( http://purl.oclc.org/NET/teiinlibraries ), which Wendell mentioned during JATS-Con. The document explains a philosophy of digitization that assumes a large volume of content, where you don't have the resources to fully encode the bodies of the documents. Perhaps you will find the approach here illuminating.

--Kevin

Current Thread