Re: [jats-list] convert PDF to JATS or BITS XML

Subject: Re: [jats-list] convert PDF to JATS or BITS XML
From: "Kevin Hawkins kevin.s.hawkins@xxxxxxxxxxxxxxxxxx" <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 11 Jun 2014 13:13:17 -0000
FYI, I just discovered that the GROBID demo has moved to
http://scite-it.eu/ .  --Kevin

On 5/7/14 8:47 AM, Kevin Hawkins wrote:
I don't have experience, but here is a list of tools I'm aware of that
convert PDF to an XML format of some sort:

* GROBID -- uses heuristics to convert a vector PDF into TEI. See
sourcecode ( https://github.com/kermitt2/grobid) and demo site (
http://grobid.no-ip.org/ ).

* pdf2xml ( https://sourceforge.net/projects/pdf2xml/ )

* pdfx ( http://pdfx.cs.man.ac.uk/ ) -- uses heuristics to convert a
vector PDF into bpdfx-xmlb.  Alex Garnett with the Public Knowledge
Project has a pipeline that converts to JATS.

* LA-PDFText ( http://code.google.com/p/lapdftext/ ) --  turns PDF into
some sort of XML

* Merops ( http://www.shabash.net/merops/default.html ) -- closed-source
and expensive solution for generating NLM XML, among other formats

* pdf2htmlEX ( https://github.com/coolwanglu/pdf2htmlEX ) -- If I
understood Liza Daly correctly at AAUP 2013, it creates HTML5 with SVG
(nonreflowable output).

* see bTools and Ideasb section of
https://web.archive.org/web/20130921075854/http://scholrev.org/hackathon

--Kevin

On 5/7/14 7:52 AM, Wei Zhao w.zhao@xxxxxxxxxxx wrote:
Any body had experience to convert PDF to JATS or BITS XML? Any
suggestions for the conversion tools other than pdfx?

Thanks,

Wei

Current Thread