Subject: Re: [jats-list] convert PDF to JATS or BITS XML From: "Kevin Hawkins kevin.s.hawkins@xxxxxxxxxxxxxxxxxx" <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Wed, 7 May 2014 13:47:32 -0000 |
I don't have experience, but here is a list of tools I'm aware of that convert PDF to an XML format of some sort:
* GROBID -- uses heuristics to convert a vector PDF into TEI. See sourcecode ( https://github.com/kermitt2/grobid) and demo site ( http://grobid.no-ip.org/ ).
* pdfx ( http://pdfx.cs.man.ac.uk/ ) -- uses heuristics to convert a vector PDF into bpdfx-xmlb. Alex Garnett with the Public Knowledge Project has a pipeline that converts to JATS.
* LA-PDFText ( http://code.google.com/p/lapdftext/ ) -- turns PDF into some sort of XML
* Merops ( http://www.shabash.net/merops/default.html ) -- closed-source and expensive solution for generating NLM XML, among other formats
* pdf2htmlEX ( https://github.com/coolwanglu/pdf2htmlEX ) -- If I understood Liza Daly correctly at AAUP 2013, it creates HTML5 with SVG (nonreflowable output).
* see bTools and Ideasb section of https://web.archive.org/web/20130921075854/http://scholrev.org/hackathon
Any body had experience to convert PDF to JATS or BITS XML? Any suggestions for the conversion tools other than pdfx?
Thanks,
Wei
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[jats-list] convert PDF to JATS or , Wei Zhao w.zhao@xxxx | Thread | [jats-list] Echoed messages to the , Tommie Usdin btusdin |
[jats-list] convert PDF to JATS or , Wei Zhao w.zhao@xxxx | Date | [jats-list] Echoed messages to the , Tommie Usdin btusdin |
Month |