Re: [xsl] comparing XML document structure

Subject: Re: [xsl] comparing XML document structure
From: Graydon <graydon@xxxxxxxxx>
Date: Wed, 17 Aug 2011 21:35:06 -0400
On Thu, Aug 18, 2011 at 12:44:02AM +0100, Tony Graham scripsit:
> On Wed, August 17, 2011 11:48 pm, Wendell Piez wrote:
> > It sounds like you want to infer content models on the fly and then
> > validate against them. I can imagine approaches to this, but I doubt
> > that I'd trust many algorithms that actually attempted it -- not because
> > of XSLT, but because of the problem specifying the problem.
> ...
> > But why not use a schema? There are processors such as Trang that can
> > infer schemas from documents.
> 
> What Wendell said.

Using trang to generated a schema from the DTD in question has
historically tended to fail.  (Not a whole lot, but some; generally
usable for creating a schema to get saxon to validate the output, but
not usable on the fly for structure.)

So I've got a relatively fixed content model, in the form of a
comprehensive DTD and a much less comprehensive example of how to use
that DTD for a particular content type.

Initially, what I want to do is eat the exemplar, use it to generate a
parent child list -- so I'd have section/num, section/para, and
section/subsection -- and then take an output file and get the same list
from it, then compare the lists and produce a message for mis-matches.
So if a particular output file had section/num, section/subsection, and
section/list in it, for example, there should be an exception noted for
the presence of the list. (Valid, but not expected.)
> ...
> > On 8/17/2011 5:57 PM, Graydon wrote:
> ...
> >> The desired goal is to be able to programmatically pull the structure,
> >> at least to the extent of parent-child element pairs, from the
> >> semantics-defining file, and compare that to each output file in turn.
> >>
> >> So if the semantics-defining file gives an example section element,
> >> which has num, para, and subsection element children, what I want to be
> >> able to do is create a sequence of axis relationships and test the
> >> section elements of the output for axis relationships that are not
> >> members of that sequence.
> 
> It would help the rest of us wrap our heads around the problem if you
> could provide a sample fragment of the "semantics-defining file" so we can
> see what you are dealing with.

It would, but the whole NDA thing rears its ugly head.

It's just a document, to the same DTD as the output.  Instead of having
actual content in it, it has things like <para>This para is optional; if
present, it should contain introductory text</para> in it.

> You may be able to create the tests you want in Schematron, but it's a bit
> hard to tell without having an example to look at.  (If you can generate
> Schematron from your definitions, you could directly create XSLT for the
> axis tests about as easily, but the advantage could be that there are
> tools such as XML IDEs that already understand the Schematron report
> format.)

Schematron is certainly something to look at, yes.

Thanks!
Graydon

Current Thread