Subject: Re: [jats-list] convert PDF to JATS or BITS XML From: "Kevin Hawkins kevin.s.hawkins@xxxxxxxxxxxxxxxxxx" <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Wed, 11 Jun 2014 13:13:17 -0000 |
FYI, I just discovered that the GROBID demo has moved to http://scite-it.eu/ . --Kevin
I don't have experience, but here is a list of tools I'm aware of that convert PDF to an XML format of some sort:
* GROBID -- uses heuristics to convert a vector PDF into TEI. See sourcecode ( https://github.com/kermitt2/grobid) and demo site ( http://grobid.no-ip.org/ ).
* pdf2xml ( https://sourceforge.net/projects/pdf2xml/ )
* pdfx ( http://pdfx.cs.man.ac.uk/ ) -- uses heuristics to convert a vector PDF into bpdfx-xmlb. Alex Garnett with the Public Knowledge Project has a pipeline that converts to JATS.
* LA-PDFText ( http://code.google.com/p/lapdftext/ ) -- turns PDF into some sort of XML
* Merops ( http://www.shabash.net/merops/default.html ) -- closed-source and expensive solution for generating NLM XML, among other formats
* pdf2htmlEX ( https://github.com/coolwanglu/pdf2htmlEX ) -- If I understood Liza Daly correctly at AAUP 2013, it creates HTML5 with SVG (nonreflowable output).
* see bTools and Ideasb section of https://web.archive.org/web/20130921075854/http://scholrev.org/hackathon
--Kevin
On 5/7/14 7:52 AM, Wei Zhao w.zhao@xxxxxxxxxxx wrote:Any body had experience to convert PDF to JATS or BITS XML? Any suggestions for the conversion tools other than pdfx?
Thanks,
Wei
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[jats-list] Image Map tagging in BI, Kirsten Howard kirst | Thread | Re: [jats-list] convert PDF to JATS, Alexander Garcia Cas |
[jats-list] Image Map tagging in BI, Kirsten Howard kirst | Date | Re: [jats-list] convert PDF to JATS, Alexander Garcia Cas |
Month |