|
Subject: RE: [jats-list] validating NLM using python, any tips? From: "Maloney, Christopher (NIH/NLM/NCBI) [C]" <maloneyc@xxxxxxxxxxxxxxxx> Date: Fri, 29 Nov 2013 15:36:51 +0000 |
Ian, I think the problem with your script is that `etree.DTD` doesn't know how
to resolve the external entities that are declared in the DTD. Because the
DTDs are modularized, when you read the DTD into a string with `r =
requests.get(...)`, the string still has a lot of external entitity references
like, for example,
<!ENTITY % archivecustom-modules.ent PUBLIC
"-//NLM//DTD Journal Archiving DTD-Specific Modules v3.0 20080202//EN"
"archivecustom-modules3.ent" >
%archivecustom-modules.ent;
and these are not getting resolved. If you first
flatten the DTD, and read it in from a local file, then it works.
Chris
________________________________________
From: Rajagopal CV
[cvr3@xxxxxxxxxxxxxxxx]
Sent: Friday, November 29, 2013 2:51 AM
To:
jats-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [jats-list] validating NLM using
python, any tips?
Yet another method is to use an ant script.
<project
basedir="./" default="parse" name="crossplatform.script">
<property
name="NLM-DTD-resources" value="NLM-DTD-resources"/>
<xmlcatalog
id="nlm.dtds">
<dtd publicId="-//NLM//DTD Journal Publishing DTD v3.0
20080202//EN"
location="${basedir}/${NLM-DTD-resources}/dtd/NLM/publishing/journalpublishin
g3.dtd"/>
</xmlcatalog>
<target name="parse">
<echo>Validating the
${input}</echo>
<xmlvalidate failonerror="yes" warn="yes" file
="${input}">
<xmlcatalog refid="nlm.dtds"/>
</xmlvalidate>
</target>
</project>
Save the above in file "build.xml" and keep the NLM DTD
resources in a
folder named "NLM-DTD-resources"
ant -buildfile build.xml
-Dinput file.xml
OR
ant -Dinput file.xml
The only advantage is that this
is cross-platform :-)
I too use xmllint on a linux box which is very handy.
--
Rajagopal
On Thu, Nov 28, 2013 at 10:55 PM, Alf Eaton
<eaton.alf@xxxxxxxxx> wrote:
> From the command line you can use xmllint (brew
install libxml2):
>
> xmllint --noout --loaddtd --valid file.xml
>
> On 28
November 2013 16:55, Ian Mulvany <i.mulvany@xxxxxxxxxxxxxxxxx> wrote:
>> Hi
All,
>>
>> I'm building a small script in python to generate NLM XML. I would
>> like a companion script
>> to validate, preferably also in python.
>>
>>
The generating script is a work in progress, but you can review it here:
>>
https://github.com/elifesciences/elife-poa-xml-generation/blob/working/genera
te-poa-xml.py
>>
>> My initial attempt to get the NLM DTD for validation
failed, the script here:
>>
https://github.com/elifesciences/elife-poa-xml-generation/blob/working/valida
te.py
>>
>> returned
>> Traceback (most recent call last):
>> File
"validate.py", line 9, in <module>
>> dtd = etree.DTD(StringIO(NLM_DTD))
>> File "dtd.pxi", line 287, in lxml.etree.DTD.__init__
>>
(src/lxml/lxml.etree.c:150450)
>> File "dtd.pxi", line 394, in
lxml.etree._parseDtdFromFilelike
>> (src/lxml/lxml.etree.c:152160)
>>
>> Has
anyone done this in python, if so do you have code you could share?
>>
>> If I
can't get it to work in python, should I consider an alternative
>> route,
what would you suggest?
>>
>> I'm developing on a mac using OSX Mavericks, but
I could also build on
>> a linux box.
>>
>>
>> - Ian
>>
>> ---
>> Head of
Technology - eLife
>> Submit now - http://submit.elifesciences.org/
>>
twitter: @IanMulvany
>>
>>
--~------------------------------------------------------------------
>>
JATS-List info and archive: http://www.mulberrytech.com/JATS/JATS-List/
>> To
unsubscribe, go to: http://lists.mulberrytech.com/jats-list/
>> or e-mail:
<mailto:jats-list-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx>
>> --~--
>>
>
>
--~------------------------------------------------------------------
>
JATS-List info and archive: http://www.mulberrytech.com/JATS/JATS-List/
> To
unsubscribe, go to: http://lists.mulberrytech.com/jats-list/
> or e-mail:
<mailto:jats-list-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx>
> --~--
>
--~------------------------------------------------------------------
JATS-List info and archive: http://www.mulberrytech.com/JATS/JATS-List/
To
unsubscribe, go to: http://lists.mulberrytech.com/jats-list/
or e-mail:
<mailto:jats-list-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx>
--~--
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [jats-list] validating NLM usin, Rajagopal CV | Thread | [no subject], Unknown |
| Re: [jats-list] validating NLM usin, Rajagopal CV | Date | |
| Month |