Re: [xsl] Friday challenge: XSLT thats creates XPaths for meaningfully equivalent comparisons of XML files

Subject: Re: [xsl] Friday challenge: XSLT thats creates XPaths for meaningfully equivalent comparisons of XML files
From: "Dimitre Novatchev" <dnovatchev@xxxxxxxxx>
Date: Fri, 13 Apr 2007 07:25:06 -0700
Some quick thoughts:

<checkXML>
  <xml src="file:/C:/test.xml">
     <check>/root[1]/foo[1]/text[1] = 'foo'</check>
     <check>/root[1]/foo[1]/@fooatt = 'att'</check>
     <check>/root[1]/bar[1]/text[1] = 'bar'</check>
     <check>/root[1]/bar[2]/text[1] = 'baz'</check>
  </xml>
</checkXML>


 1. Checking if the above XPath expressions all evaluate to true() is
not a guarantee that the two documents are the same. One of them could
be a prefix (has all of the first N nodes in document order of the
other document, but the other document has still more nodes after the
"first N nodes").  Therefore, an essential XPath expression that is
missing is:
    count(//node() | //@* | //namespace::*)  = N

This XPath expression illustrates also that according to our
definition of "document equality" some of its subexpressions and the
right-hand-side of the equality test above may differ when "equality"
is defined in a different way  -- for example, do all attribute and
namespace nodes matter, do we take into account comment nodes and/or
processing instructions, ..., etc.

There are even such people, according to whom the following are different:

<someElement/>

and

<someElement></someElement>

and a lot of similar purely lexical differences (escaped text or
CDATA, double or single quotes, explicit declaration of a namespace
node inherited from the parent, order of attributes, ..., etc.)


2. What is even more important, even if all issues described in 1. above have been solved/agreed-upon, the fact that the result of an XSLT 2.0 transformation is the same as the result of an XSLT 1.0 transformation of a given document *does not guarantee* that the two transformations will have the same result when applied on another xml document.

To put it in other words, the proposed tool will be effective in
showing that two transformations do not produce the same results, but
it cannot be used in ascertaining that two transformations will always
produce the same result.




-- Cheers, Dimitre Novatchev --------------------------------------- Truly great madness cannot be achieved without significant intelligence. --------------------------------------- To invent, you need a good imagination and a pile of junk ------------------------------------- You've achieved success in your field when you don't know whether what you're doing is work or play



On 4/13/07, Andrew Welch <andrew.j.welch@xxxxxxxxx> wrote:
This is a slight variation of the "generate xpaths to all elements"
problem.  Here the goals is to create a set of XPaths from a given XML
document, such that when applied to an XML document if all XPaths
return true, the two XML documents are meaningfully equivalent.

Why?  Well before setting about upgrading existing transforms to 2.0,
or say performance tuning, its useful to have tests in place to ensure
the output after the modifications is still the same as before.   One
simple way to do this is use a tool like CheckXML with a set of
generated XPaths.

So for this input:

<root>
       <foo fooatt="att">foo</foo>
       <bar>bar</bar>
       <bar>baz</bar>
</root>

And this transform:

<xsl:template match="/">
       <checkXML>
               <xml src="{document-uri(/)}">
                       <xsl:for-each select="//*">
                               <xsl:variable name="path" select="concat('/', string-join(for $x
in ancestor-or-self::*
                               return concat($x/local-name(), '[',
count(.|$x/preceding-sibling::*[name() = current()/name()]), ']'),
'/'))" as="xs:string"/>
                               <xsl:for-each select="text()[normalize-space(.) != '']">
                                       <check><xsl:value-of select="concat($path, '/text[', position(),
'] = ''', .,'''')"/></check>
                               </xsl:for-each>
                               <xsl:for-each select="@*">
                                       <check><xsl:value-of select="concat($path, '/@', name(), ' = ''',
., '''')"/></check>
                               </xsl:for-each>
                       </xsl:for-each>
               </xml>
       </checkXML>
</xsl:template>

The result is:

<checkXML>
  <xml src="file:/C:/test.xml">
     <check>/root[1]/foo[1]/text[1] = 'foo'</check>
     <check>/root[1]/foo[1]/@fooatt = 'att'</check>
     <check>/root[1]/bar[1]/text[1] = 'bar'</check>
     <check>/root[1]/bar[2]/text[1] = 'baz'</check>
  </xml>
</checkXML>

The above transform is quick attempt to demonstrate the problem, I'm
sure it can be improved on.

Also, if there are any thoughts about this approach it would be good
to hear them.

cheers
andrew

ps. CheckXML is a work in progress, contact me if you'd like to be involved.

Current Thread