Re: [xsl] comparing XML document structure

Subject: Re: [xsl] comparing XML document structure
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Wed, 17 Aug 2011 18:48:55 -0400

It sounds like you want to infer content models on the fly and then validate against them. I can imagine approaches to this, but I doubt that I'd trust many algorithms that actually attempted it -- not because of XSLT, but because of the problem specifying the problem.

Falling short of the general case there are lots of things you could do:

<xsl:template match="section/*">
<xsl:variable name="expected-children" select="distinct-values(/path/to/specification/section/*/name()"/>
<xsl:for-each select="*[not(name()=$expected-children)]">
element <xsl:value-of select="name()"/> not expected here

It's also possible to index elements by their names plus the names of parents (and additional criteria if necessary), if that's a help for retrieving things for comparison.

The bottom line is that while what you envision may not be practicable, that doesn't mean there aren't useful things that can be done.

But why not use a schema? There are processors such as Trang that can infer schemas from documents.


On 8/17/2011 5:57 PM, Graydon wrote:
So I have an XML document which defines the expected semantics of the
XML output of an SGML-to-XML conversion project as an exemplar; there
are structures like this, and like these, and like that.

I also have a whole bunch of XML output which ought to conform to that
semantics.  (This output is the product of a complex, multi-pass, highly
conditional set of XSLT transforms.)

The desired goal is to be able to programmatically pull the structure,
at least to the extent of parent-child element pairs, from the
semantics-defining file, and compare that to each output file in turn.

So if the semantics-defining file gives an example section element,
which has num, para, and subsection element children, what I want to be
able to do is create a sequence of axis relationships and test the
section elements of the output for axis relationships that are not
members of that sequence.

I'm nearly certain I can't do that, but thought it was much wiser to ask
and allow for the possibility of a pleasant surprise.

-- Graydon

-- ====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================

Current Thread