Re: [xsl] comparing XML document structure

Subject: Re: [xsl] comparing XML document structure
From: Graydon <graydon@xxxxxxxxx>
Date: Wed, 17 Aug 2011 21:38:46 -0400
On Wed, Aug 17, 2011 at 06:48:55PM -0400, Wendell Piez scripsit:
> Graydon,
> 
> It sounds like you want to infer content models on the fly and then
> validate against them. I can imagine approaches to this, but I doubt
> that I'd trust many algorithms that actually attempted it -- not
> because of XSLT, but because of the problem specifying the problem.

It's more that I've got a content model and want to see where things
fail, but yes.

> Falling short of the general case there are lots of things you could do:
> 
> <xsl:template match="section/*">
>   <xsl:variable name="expected-children"
> select="distinct-values(/path/to/specification/section/*/name()"/>
>   <xsl:for-each select="*[not(name()=$expected-children)]">
>     element <xsl:value-of select="name()"/> not expected here
>   </xsl:for-each>
> </xsl:template>

Thank you!  That's much cleaner than my first kick at the problem.

> It's also possible to index elements by their names plus the names
> of parents (and additional criteria if necessary), if that's a help
> for retrieving things for comparison.
> 
> The bottom line is that while what you envision may not be
> practicable, that doesn't mean there aren't useful things that can
> be done.
> 
> But why not use a schema? There are processors such as Trang that
> can infer schemas from documents.

The target DTD is this big complex thing that sometimes has multiple
entity replacements (dev version versus production version.)  Trang has
serious issues schemafying it.

The issue is also not one of validity; the content is valid before it
gets to this step.  The problem is that the DTD is used very widely
(corporate goal to get all the content into the same DTD) but the part
of the DTD used for the particular content type is a much smaller
subset.

So what I'm trying to do is say "does this output file fit into the
subset?"

Thanks!
Graydon

Current Thread