Subject: RE: [jats-list] validating NLM using python, any tips? From: "Maloney, Christopher (NIH/NLM/NCBI) [C]" <maloneyc@xxxxxxxxxxxxxxxx> Date: Fri, 29 Nov 2013 15:36:51 +0000 |
Ian, I think the problem with your script is that `etree.DTD` doesn't know how to resolve the external entities that are declared in the DTD. Because the DTDs are modularized, when you read the DTD into a string with `r = requests.get(...)`, the string still has a lot of external entitity references like, for example, <!ENTITY % archivecustom-modules.ent PUBLIC "-//NLM//DTD Journal Archiving DTD-Specific Modules v3.0 20080202//EN" "archivecustom-modules3.ent" > %archivecustom-modules.ent; and these are not getting resolved. If you first flatten the DTD, and read it in from a local file, then it works. Chris ________________________________________ From: Rajagopal CV [cvr3@xxxxxxxxxxxxxxxx] Sent: Friday, November 29, 2013 2:51 AM To: jats-list@xxxxxxxxxxxxxxxxxxxxxx Subject: Re: [jats-list] validating NLM using python, any tips? Yet another method is to use an ant script. <project basedir="./" default="parse" name="crossplatform.script"> <property name="NLM-DTD-resources" value="NLM-DTD-resources"/> <xmlcatalog id="nlm.dtds"> <dtd publicId="-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" location="${basedir}/${NLM-DTD-resources}/dtd/NLM/publishing/journalpublishin g3.dtd"/> </xmlcatalog> <target name="parse"> <echo>Validating the ${input}</echo> <xmlvalidate failonerror="yes" warn="yes" file ="${input}"> <xmlcatalog refid="nlm.dtds"/> </xmlvalidate> </target> </project> Save the above in file "build.xml" and keep the NLM DTD resources in a folder named "NLM-DTD-resources" ant -buildfile build.xml -Dinput file.xml OR ant -Dinput file.xml The only advantage is that this is cross-platform :-) I too use xmllint on a linux box which is very handy. -- Rajagopal On Thu, Nov 28, 2013 at 10:55 PM, Alf Eaton <eaton.alf@xxxxxxxxx> wrote: > From the command line you can use xmllint (brew install libxml2): > > xmllint --noout --loaddtd --valid file.xml > > On 28 November 2013 16:55, Ian Mulvany <i.mulvany@xxxxxxxxxxxxxxxxx> wrote: >> Hi All, >> >> I'm building a small script in python to generate NLM XML. I would >> like a companion script >> to validate, preferably also in python. >> >> The generating script is a work in progress, but you can review it here: >> https://github.com/elifesciences/elife-poa-xml-generation/blob/working/genera te-poa-xml.py >> >> My initial attempt to get the NLM DTD for validation failed, the script here: >> https://github.com/elifesciences/elife-poa-xml-generation/blob/working/valida te.py >> >> returned >> Traceback (most recent call last): >> File "validate.py", line 9, in <module> >> dtd = etree.DTD(StringIO(NLM_DTD)) >> File "dtd.pxi", line 287, in lxml.etree.DTD.__init__ >> (src/lxml/lxml.etree.c:150450) >> File "dtd.pxi", line 394, in lxml.etree._parseDtdFromFilelike >> (src/lxml/lxml.etree.c:152160) >> >> Has anyone done this in python, if so do you have code you could share? >> >> If I can't get it to work in python, should I consider an alternative >> route, what would you suggest? >> >> I'm developing on a mac using OSX Mavericks, but I could also build on >> a linux box. >> >> >> - Ian >> >> --- >> Head of Technology - eLife >> Submit now - http://submit.elifesciences.org/ >> twitter: @IanMulvany >> >> --~------------------------------------------------------------------ >> JATS-List info and archive: http://www.mulberrytech.com/JATS/JATS-List/ >> To unsubscribe, go to: http://lists.mulberrytech.com/jats-list/ >> or e-mail: <mailto:jats-list-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx> >> --~-- >> > > --~------------------------------------------------------------------ > JATS-List info and archive: http://www.mulberrytech.com/JATS/JATS-List/ > To unsubscribe, go to: http://lists.mulberrytech.com/jats-list/ > or e-mail: <mailto:jats-list-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx> > --~-- > --~------------------------------------------------------------------ JATS-List info and archive: http://www.mulberrytech.com/JATS/JATS-List/ To unsubscribe, go to: http://lists.mulberrytech.com/jats-list/ or e-mail: <mailto:jats-list-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx> --~--
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [jats-list] validating NLM usin, Rajagopal CV | Thread | [no subject], Unknown |
Re: [jats-list] validating NLM usin, Rajagopal CV | Date | |
Month |