Re: [jats-list] convert PDF to JATS or BITS XML

Subject: Re: [jats-list] convert PDF to JATS or BITS XML
From: "Evan Owens eowens@xxxxxxx" <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 11 Jun 2014 14:02:53 -0000
Another PDX to XML project...but not JATS specific.

University of Manchester (UK) has a PDF to XML software called PDFX.  CrossRef
is looking at using this technology for a metadata extraction service for
small publishers who don't do full text XML.  Here's a conference paper about
this technology:

https://www.escholar.manchester.ac.uk/uk-ac-man-scw:218911

I heard in March that Manchester hasn't made the software open source yet but
they do seem to have a web service:   http://pdfx.cs.man.ac.uk/   so one could
try it out there.

Evan Owens, AIP Publishing LLC


-----Original Message-----
From: Alexander Garcia Castro alexgarciac@xxxxxxxxx
[mailto:jats-list-service@xxxxxxxxxxxxxxxxxxxxxx]
Sent: Wednesday, June 11, 2014 9:40 AM
To: jats-list
Subject: Re: [jats-list] convert PDF to JATS or BITS XML

for academic papers, due to the heterogeneity in formats and ways to produce
the final pdf, the one tool that will give u a clean usable output is
crocodoc. I run jailbreaking the pdf, a workshop aiming to get usable text
from PDF. Here, by usable I mean clean, no mistakes, with bold, italics,
footnotes, bibliographic references, tables, figures, etc ready to be used for
whatever purpose. results were not encouraging. crocodoc gives u HTML5, clean
and reusable.

On Wed, May 7, 2014 at 7:52 AM, Wei Zhao w.zhao@xxxxxxxxxxx
<jats-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> Any body had experience to convert PDF to JATS or BITS XML? Any
> suggestions for the conversion tools other than pdfx?
>
> Thanks,
>
> Wei
>
> --
> Wei Zhao
> Metadata Librarian
> OCUL/Scholars Portal
> Phone: 416 946-0951
> Fax: 416 978-1668
> w.zhao@xxxxxxxxxxx
>



--
Alexander Garcia
http://www.alexandergarcia.name/
http://www.usefilm.com/photographer/75943.html
http://www.linkedin.com/in/alexgarciac

Current Thread