Re: [jats-list] JATS to RDF Conversion

Subject: Re: [jats-list] JATS to RDF Conversion
From: "Alf Eaton eaton.alf@xxxxxxxxx" <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 9 Jun 2016 08:21:39 -0000
Hi Sky,

You might be interested in the work that science.ai are doing on conversion
of JATS to Scholarly HTML, which is HTML5 + RDFa (schema.org + extensions):
http://scholarly.vernacular.io/
http://w3c.github.io/scholarly-html/
https://lists.w3.org/Archives/Public/public-scholarlyhtml/2016Mar/0006.html

It's still a work-in-progress so not ready for you to use yet, but once
it's ready you'd be able to use any RDFa parser on that HTML to get the
data graph.

Alternatively I have a conversion of JATS to HTML + microdata (schema.org),
where again you could use a microdata parser to extract the data graph:
https://github.com/PeerJ/jats-conversion/blob/master/src/data/xsl/jats-to-html.xsl

That's still metadata about the article, rather than marking up the meaning
of the content of the article, but maybe that's enough for your purposes?

Best wishes,
Alf





On 8 June 2016 at 23:11, Sky Hester skyh@xxxxxxxx <
jats-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> Hi Everyone,
>
> A well-founded and clearly documented approach to the conversion of NISO
> JATS 1.0 (all tag sets)  document metadata to RDF is described in the
> article "From Markup to Linked Data" (S. Peroni, D.A. Lapeyre, and D.
> Shotton., 2012)[1]. The authors used an XSL transformation (written in XSLT
> 2.0* by Silvio) to declare the mapping *from* XML conforming to JATS *to*
> RDF XML conforming to SPAR, DCMI, and a couple of other open vocabularies.
>
> Many of us - at least I do - need to be able to extract semantic content
> from documents conforming to scholarly publishing standards, and I think
> the work done in the article cited above is the right place to start. Has
> anyone continued this effort since the paper? That is, do you know of any
> more recent XSLT resources for converting JATS-conformant XML to RDF using
> standard scholarly ontologies?
>
> I asked the same question to Debbie Lapeyre at Mulberry, one of the
> authors, and she explained the following:
>
> "
>         I have not [continued to work on it]. There were some complaints
> at the time that the mapping was good, but that the ontologies chosen were
> not the most commonly used, and that a mapping to more typically-used
> ontologies would have been better for the community.
> "
>
> Now that we have resources such as schema.org[2] and LOV[3], it may be
> less of a concern; for my purposes, the ontologies chosen are sufficient,
> and queries can be processed with OWL/RDFS subclasses and equivalences
> declared to preferred ones later, if needed. Then again, maybe someone has
> already produced a modified or more complete mapping using ontologies they
> prefer? I think the crucial work is in JATS coverage rather than names on
> the RDF side.
>
> Such a resource (further work on JATS2RDF) would be generally useful to
> the community.
>
> In the interest of completeness, the "Biotea"[4] paper in Journal of
> Biomedical Semantics describes an alternative RDF transformation more
> tailored to the PMC corpus than generic JATS (found on the JATS-list
> archive[5]).
>
> -Sky Hester
>
> [1] http://www.ncbi.nlm.nih.gov/books/NBK100491/
> [2] http://schema.org/
> [3] http://lov.okfn.org/dataset/lov/
> [4] http://jbiomedsem.biomedcentral.com/articles/10.1186/2041-1480-4-S1-S5
> [5]
> http://www.biglist.com/lists/lists.mulberrytech.com/jats-list/archives/201305/msg00028.html
>
> *I was only able to find one free implementation of XSLT 2.0, which is
> saxon-HE

Current Thread