[jats-list] JATS to RDF Conversion

Subject: [jats-list] JATS to RDF Conversion
From: "Sky Hester skyh@xxxxxxxx" <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 8 Jun 2016 22:11:37 -0000
Hi Everyone,

A well-founded and clearly documented approach to the conversion of NISO JATS
1.0 (all tag sets)  document metadata to RDF is described in the article "From
Markup to Linked Data" (S. Peroni, D.A. Lapeyre, and D. Shotton., 2012)[1].
The authors used an XSL transformation (written in XSLT 2.0* by Silvio) to
declare the mapping *from* XML conforming to JATS *to* RDF XML conforming to
SPAR, DCMI, and a couple of other open vocabularies.

Many of us - at least I do - need to be able to extract semantic content from
documents conforming to scholarly publishing standards, and I think the work
done in the article cited above is the right place to start. Has anyone
continued this effort since the paper? That is, do you know of any more recent
XSLT resources for converting JATS-conformant XML to RDF using standard
scholarly ontologies?

I asked the same question to Debbie Lapeyre at Mulberry, one of the authors,
and she explained the following:

"
	I have not [continued to work on it]. There were some complaints at the time
that the mapping was good, but that the ontologies chosen were not the most
commonly used, and that a mapping to more typically-used ontologies would have
been better for the community.
"

Now that we have resources such as schema.org[2] and LOV[3], it may be less of
a concern; for my purposes, the ontologies chosen are sufficient, and queries
can be processed with OWL/RDFS subclasses and equivalences declared to
preferred ones later, if needed. Then again, maybe someone has already
produced a modified or more complete mapping using ontologies they prefer? I
think the crucial work is in JATS coverage rather than names on the RDF side.

Such a resource (further work on JATS2RDF) would be generally useful to the
community.

In the interest of completeness, the "Biotea"[4] paper in Journal of
Biomedical Semantics describes an alternative RDF transformation more tailored
to the PMC corpus than generic JATS (found on the JATS-list archive[5]).

-Sky Hester

[1] http://www.ncbi.nlm.nih.gov/books/NBK100491/
[2] http://schema.org/
[3] http://lov.okfn.org/dataset/lov/
[4] http://jbiomedsem.biomedcentral.com/articles/10.1186/2041-1480-4-S1-S5
[5]
http://www.biglist.com/lists/lists.mulberrytech.com/jats-list/archives/201305
/msg00028.html

*I was only able to find one free implementation of XSLT 2.0, which is
saxon-HE

Current Thread