RE: [jats-list] validating NLM using python, any tips?

Subject: RE: [jats-list] validating NLM using python, any tips?
From: "Maloney, Christopher (NIH/NLM/NCBI) [C]" <maloneyc@xxxxxxxxxxxxxxxx>
Date: Fri, 29 Nov 2013 15:36:51 +0000
Ian, I think the problem with your script is that `etree.DTD` doesn't know how
to resolve the external entities that are declared in the DTD.  Because the
DTDs are modularized, when you read the DTD into a string with `r =
requests.get(...)`, the string still has a lot of external entitity references
like, for example, 

    <!ENTITY % archivecustom-modules.ent PUBLIC
"-//NLM//DTD Journal Archiving DTD-Specific Modules v3.0 20080202//EN"
"archivecustom-modules3.ent"                                         >
%archivecustom-modules.ent;

and these are not getting resolved.  If you first
flatten the DTD, and read it in from a local file, then it works.

Chris
________________________________________
From: Rajagopal CV
[cvr3@xxxxxxxxxxxxxxxx]
Sent: Friday, November 29, 2013 2:51 AM
To:
jats-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [jats-list] validating NLM using
python, any tips?

Yet another method is to use an ant script.

<project
basedir="./" default="parse" name="crossplatform.script">
  <property
name="NLM-DTD-resources"  value="NLM-DTD-resources"/>
  <xmlcatalog
id="nlm.dtds">
    <dtd publicId="-//NLM//DTD Journal Publishing DTD v3.0
20080202//EN"
location="${basedir}/${NLM-DTD-resources}/dtd/NLM/publishing/journalpublishin
g3.dtd"/>
  </xmlcatalog>
  <target name="parse">
    <echo>Validating the
${input}</echo>
    <xmlvalidate failonerror="yes" warn="yes" file
="${input}">
      <xmlcatalog refid="nlm.dtds"/>
    </xmlvalidate>
</target>
</project>

Save the above in file "build.xml" and keep the NLM DTD
resources in a
folder named  "NLM-DTD-resources"

ant -buildfile build.xml
-Dinput file.xml

OR

ant -Dinput file.xml


The only advantage is that this
is cross-platform :-)

I too use xmllint on a linux box which is very handy.
--
Rajagopal

On Thu, Nov 28, 2013 at 10:55 PM, Alf Eaton
<eaton.alf@xxxxxxxxx> wrote:
> From the command line you can use xmllint (brew
install libxml2):
>
> xmllint --noout --loaddtd --valid file.xml
>
> On 28
November 2013 16:55, Ian Mulvany <i.mulvany@xxxxxxxxxxxxxxxxx> wrote:
>> Hi
All,
>>
>> I'm building a small script in python to generate NLM XML. I would
>> like a companion script
>> to validate, preferably also in python.
>>
>>
The generating script is a work in progress, but you can review it here:
>>
https://github.com/elifesciences/elife-poa-xml-generation/blob/working/genera
te-poa-xml.py
>>
>> My initial attempt to get the NLM DTD for validation
failed, the script here:
>>
https://github.com/elifesciences/elife-poa-xml-generation/blob/working/valida
te.py
>>
>> returned
>> Traceback (most recent call last):
>>   File
"validate.py", line 9, in <module>
>>     dtd = etree.DTD(StringIO(NLM_DTD))
>>   File "dtd.pxi", line 287, in lxml.etree.DTD.__init__
>>
(src/lxml/lxml.etree.c:150450)
>>   File "dtd.pxi", line 394, in
lxml.etree._parseDtdFromFilelike
>> (src/lxml/lxml.etree.c:152160)
>>
>> Has
anyone done this in python, if so do you have code you could share?
>>
>> If I
can't get it to work in python, should I consider an alternative
>> route,
what would you suggest?
>>
>> I'm developing on a mac using OSX Mavericks, but
I could also build on
>> a linux box.
>>
>>
>> - Ian
>>
>> ---
>> Head of
Technology - eLife
>> Submit now - http://submit.elifesciences.org/
>>
twitter: @IanMulvany
>>
>>
--~------------------------------------------------------------------
>>
JATS-List info and archive:  http://www.mulberrytech.com/JATS/JATS-List/
>> To
unsubscribe, go to: http://lists.mulberrytech.com/jats-list/
>> or e-mail:
<mailto:jats-list-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx>
>> --~--
>>
>
>
--~------------------------------------------------------------------
>
JATS-List info and archive:  http://www.mulberrytech.com/JATS/JATS-List/
> To
unsubscribe, go to: http://lists.mulberrytech.com/jats-list/
> or e-mail:
<mailto:jats-list-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx>
> --~--
>
--~------------------------------------------------------------------
JATS-List info and archive:  http://www.mulberrytech.com/JATS/JATS-List/
To
unsubscribe, go to: http://lists.mulberrytech.com/jats-list/
or e-mail:
<mailto:jats-list-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx>
--~--

Current Thread