RE: [xsl] Comparing two xml documents

Subject: RE: [xsl] Comparing two xml documents
From: "Lars Huttar" <lars_huttar@xxxxxxx>
Date: Wed, 12 Mar 2003 16:15:40 -0600
Hi Ragulf,
Dimitre has kind of popped this bubble with his pointer to xmlDiff :-),
but here are some thoughts for doing-it-yourself...

Btw I'm not sure what you meant when you said

> I have two flat xml documents
             ^^^^
Does this mean their document elements have no grandchildren?
(Not that it matters much... we can do without that assumption.)

Anyway, to answer a couple of your specific questions...

> <!-- Doing the transformation on A -->
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <xsl:transform version="1.0" 
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
>   <xsl:variable name="B" select="document('B.xml')"/>
> 
>   <xsl:template match="/">
>     <xsl:apply-templates select="RootElement">
>   </xsl:template>
> 
>   <xsl:template match="RootElement">
>     <xsl:apply-templates/>
>   </xsl:template>
> 
>   <xsl:template match="*"> <!-- Does this match all elements? -->

Yes, all elements that aren't more specifically matched by another
template, such as the above one.

>     <!-- Here I would like to be able to search for the 
> element with the 
> same name as the chosen one -->
> 
> 
>     <xsl:apply-templates select="$B/RootElement"/> <!-- Would 
> this select 
> the RootElement of B? -->

Yes... whether that's what you want or not...

>   </xsl:template>
> 
>   <xsl:template match="$B/RootElement">
>     <xsl:apply-templates/>
>   </xsl:template>
> 
>   <xsl:template match="$B/*"> <!-- Does this match the given 
> elements in 
> document B? -->

This template matches the same thing as the previous one,
namely the only child of $B's root node, i.e. $B/RootElement.
Since these two templates match the same node at the same
priority, this would be an error.  I think you want
match="$B/RootElement/*" or "$B/RootElement//*".

>     <!-- And here I want to check whether the element exists 
> in document A. 
> Will I need a variable for that document? And how would I do that, if 
> needed? -->

You could declare another global variable near "B",

  <xsl:variable name="A" select="/" />

There may be another way but I don't know it.  (I thought document()
had a way to do this but I don't see it.)

>   </xsl:template>
> 
> 
> </xsl:transform>


Anyway, here is a solution that works:


<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
                version="1.0">

  <xsl:variable name="A" select="/" />
  <xsl:variable name="B" select="document('try-compare2.xml')"/>

  <!-- Replace // with / everywhere if we're only interested
   	in immediate children of /RootElement. -->

  <xsl:template match="/">
    <!-- Process all descendant nodes in A: -->
    <xsl:apply-templates select="/RootElement//*" mode="A" />
    <!-- Process all descendant nodes in B: -->
    <xsl:apply-templates select="$B/RootElement//*" mode="B" />
  </xsl:template>

  <xsl:template match="*" mode="A">
    <!-- Figure out whether this A node has a namesake in B,
         and output an appropriate message. -->
    <xsl:variable name="curname" select="name()" />
    <xsl:variable name="matching" select="$B/RootElement//*[name() = $curname]" />
    <p>
      There is an element &lt;<xsl:value-of select="name()" />&gt; in
      <xsl:choose>
        <xsl:when test="$matching">
          both documents.
          <xsl:if test="string(.) != string($matching)">
            But they differ: '<xsl:value-of select="." />' !=
            '<xsl:value-of select="$matching" />'.
          </xsl:if>
        </xsl:when>
        <xsl:otherwise> document A. </xsl:otherwise>
      </xsl:choose>
    </p>
  </xsl:template>

  <xsl:template match="*" mode="B">
    <!-- Figure out whether this B node has a namesake in A.
         If not, mention it. -->
    <xsl:variable name="curname" select="name()" />
    <xsl:variable name="matching" select="$A/RootElement//*[name() = $curname]" />
    <xsl:if test="not($matching)">
      There is an element &lt;<xsl:value-of select="name()" />&gt;
      in document B.
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>


Later Ragulf wrote (re: xmlDiff):

> Hi Dimitre,
> 
> Well, one reason was that I did not know the site.
> 
> Another reason, now that I know the site, is because the 
> files are way over 
> 200 KB which are the maximum sizes (or size) for the files to 
> be compared.

It seems like you could
save a lot of cycles by first sorting both documents by element
name; then go through both lists at once using linear recursion,
[does this mean you have to convert RTFs to node-sets? I'm not sure]
comparing the "next" element in each sorted document, and updating
the two "next" pointers according to which "next" node compares earlier.

This would give you O(n log n + m log m) time instead of O(n * m) where
n and m are the number of nodes in the two documents.
(I imagine xmlDiff does use a sort, but it was written using .NET,
not XSLT.)

On the other hand, the 200KB limit may be just the max load they want
you to put on their server.  If you download xmlDiff and run it yourself
it might work with your data.

If not, let me know... this is an interesting problem.

Lars


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread