Re: [jats-list] convert PDF to JATS or BITS XML

Subject: Re: [jats-list] convert PDF to JATS or BITS XML
From: "Alexander Garcia Castro alexgarciac@xxxxxxxxx" <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 12 Jun 2014 18:57:14 -0000
Hi Kevin, Crocodoc was not tested at the time of the pdfjailbreak
event. I started to work with crocodoc in January. not encouraging
means that for what I needed all the tools we tested some how felt
short. I needed a perfect extraction of the layout as well as of the
text -no mistakes. from all I have tried, crocodoc is the only one
-although it is a comercial product, it is fairly easy to use for
testing purposes. there are some issues with crocodoc, but so far so
good.

On Thu, Jun 12, 2014 at 12:30 PM, Kevin Hawkins
kevin.s.hawkins@xxxxxxxxxxxxxxxxxx
<jats-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> Alex,
>
> Now that I've googled the phrase "Jailbreaking the PDF", I see that the link
> I suggested in May:
>
> https://web.archive.org/web/20130921075854/http://scholrev.org/hackathon
>
> can now be found here:
>
> http://pdfjailbreak.com/tools
>
> Still, I'm confused about which "results were not encouraging".  Are you
> speaking about GROBID in particular (for which I resurrected this thread),
> or all tools except Crocodoc?  Is there a reason that Crocodoc is not listed
> at
> http://pdfjailbreak.com/tools ?
>
> --Kevin
>
>
> On 6/11/14 8:39 AM, Alexander Garcia Castro alexgarciac@xxxxxxxxx wrote:
>>
>> for academic papers, due to the heterogeneity in formats and ways to
>> produce the final pdf, the one tool that will give u a clean usable
>> output is crocodoc. I run jailbreaking the pdf, a workshop aiming to
>> get usable text from PDF. Here, by usable I mean clean, no mistakes,
>> with bold, italics, footnotes, bibliographic references, tables,
>> figures, etc ready to be used for whatever purpose. results were not
>> encouraging. crocodoc gives u HTML5, clean and reusable.
>>
>> On Wed, May 7, 2014 at 7:52 AM, Wei Zhao w.zhao@xxxxxxxxxxx
>> <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>>
>>> Any body had experience to convert PDF to JATS or BITS XML? Any
>>> suggestions
>>> for the conversion tools other than pdfx?
>>>
>>> Thanks,
>>>
>>> Wei
>>>
>>> --
>>> Wei Zhao
>>> Metadata Librarian
>>> OCUL/Scholars Portal
>>> Phone: 416 946-0951
>>> Fax: 416 978-1668
>>> w.zhao@xxxxxxxxxxx
>>>
>>
>>
>>
> 



-- 
Alexander Garcia
http://www.alexandergarcia.name/
http://www.usefilm.com/photographer/75943.html
http://www.linkedin.com/in/alexgarciac

Current Thread