Re: [xsl] comparing nodesets to each other

Subject: Re: [xsl] comparing nodesets to each other
From: "Kai Hackemesser" <kaha@xxxxxx>
Date: Mon, 11 Apr 2005 21:46:25 +0200 (MEST)
Hello, Aron,

I try to be more exact in my definition:
- two nodes 'relation' are different, if they have the same value in
relation/Attribute[@Name='FindNumber']/Value but the text value of both
node's children at all is different.
- a 'relation' node must be listed, too, if there is no corresponding
'relation' node with same relation/Attribute[@Name='FindNumber']/Value
- I need to know in which list a node is changed/added/removed.
- The whole list of changes needs to be sorted by the
Attribute[@Name='FindNumber']/Value

Regards, Kai

> Kai,
> 
> IMO the general problem of finding the differences between any 2 XML 
> documents is, shall we say, challenging.  Something that helps such an 
> operation is being extremely precise about what constitutes a difference, 
> and being able to formulate precedence rules in comparision operations. 
> An 
> earlier respondent illustrated the need for this with an example that 
> "added" a node in the second document.  It's very likely *you* have a good
> idea of what you're after, but in these types of problems you'll get the 
> most help if you can express your "rules for comparision" in [formal] 
> written form.
> 
> Consider the following documents:
> 
> doc1.xml
> =======
> <doc>
> <chapter n="1"/>
> <chapter n="2"/>
> </doc>
> 
> doc2.xml
> =======
> <doc>
> <chapter n="1"/>
> <chapter n="2">
>   <para n="1"/>
> </chapter>
> </doc>
> 
> What *exactly* would you like in your final output?  Do you want to see
> only 
> the node <para n="1"/>?  Do you want to see <para n="1"/> and all its
> parent 
> nodes?  You see where this is going?  It helps to be precise.
> 
> Also, while writing a "general" differencing algorithm would be
> worthwhile, 
> it's probably not simple.  To start you'll have better luck if you
> constrain 
> your problem, as it relates to your domain.  One way to do this is by 
> identifying a least granular level for your purposes--perhaps a node or 
> "level" below which identifying differences is superfluous.  In the
> example 
> above, you could say:
> 
> --chapter nodes are compared by their "n" attribute
> --if there are any differences betweein 2 <chapter> nodes or any of their 
> descendents, the entire <chapter> node is considered "changed", and that
> of 
> doc2.xml is output
> 
> I've done this type of "constrained" comparision with success.
> 
> Here's another approach to consider: preprocess each xml document to a 
> "standard" format, then use a textual diff tool.  The idea here is that
> you 
> apply an XSL transform to doc1.xml so that <chapter> nodes are sequential,
> their descendents are ordered is a specific way, etc.  Do the same with 
> doc2.xml.  Then use a diff tool ( eg: beyondcompare, from 
> http://www.scootersoftware.com/ ) to check differences.  Note, this method
> is susceptible to line-breaks, so it's not trivial to implement.
> 
> Regards
> 
> --A
> 
> 
> 
> >From: "Kai Hackemesser" <kaha@xxxxxx>
> >Reply-To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> >To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> >Subject: Re: [xsl] comparing nodesets to each other
> >Date: Mon, 11 Apr 2005 18:18:47 +0200 (MEST)
> >
> >Hello, David,
> >
> >Thanks for the response. The errors you mentioned already have happened,
> >that's why I'm currently clueless how to solve it.
> >
> >I try to show the structure of the recipe (eased):
> >
> ><object>
> >   <relation>
> >     <Attribute Type="string" Name="FindNumber">
> >       <Value><![CDATA[0005]]></Value>
> >     <Attribute>
> >     <Attribute Type="float" Name="...
> >     <object>
> >       <Attribute Type="string" Name="PartNumber">
> >         <Value><![CDATA[Part1]]></Value>
> >       </Attribute>
> >     </object>
> >   </relation>
> >   <relation>
> >     <Attribute Type="string" Name="FindNumber">
> >       <Value><![CDATA[0010]]></Value>
> >     <Attribute>
> >     <Attribute Type="float" Name="...
> >     <object>
> >       <Attribute Type="string" Name="PartNumber">
> >         <Value><![CDATA[Part2]]></Value>
> >       </Attribute>
> >     </object>
> >   </relation>
> >   <relation>
> >     <Attribute Type="string" Name="FindNumber">
> >       <Value><![CDATA[0015]]></Value>
> >     <Attribute>
> >     <Attribute Type="float" Name="...
> >     <object>
> >       <Attribute Type="string" Name="PartNumber">
> >         <Value><![CDATA[Part3]]></Value>
> >       </Attribute>
> >     </object>
> >   </relation>
> ></object>
> >
> >needs to be compared against a similar structure:
> ><object>
> >   <relation>
> >     <Attribute Type="string" Name="FindNumber">
> >       <Value><![CDATA[0005]]></Value>
> >     <Attribute>
> >     <Attribute Type="float" Name="...
> >     <object>
> >       <Attribute Type="string" Name="PartNumber">
> >         <Value><![CDATA[Part1]]></Value>
> >       </Attribute>
> >     </object>
> >   </relation>
> >   <relation>
> >     <Attribute Type="string" Name="FindNumber">
> >       <Value><![CDATA[0015]]></Value>
> >     <Attribute>
> >     <Attribute Type="float" Name="...
> >     <object>
> >       <Attribute Type="string" Name="PartNumber">
> >         <Value><![CDATA[Part3b]]></Value>
> >       </Attribute>
> >     </object>
> >   </relation>
> ></object>
> >
> >(Attribute nodes are more than one per object or relation node)
> >
> >So I need to extract all differences like attribute change, missing
> nodes,
> >altered nodes, added nodes. To identify a node I use the findnumber
> >Attribute node of each relation node. A missing node is one, where the
> >corresponding Findnumber Attribute value is missing in nodelist 'b'. An
> >added node is one where the corresponding Findnumber Attribute value is
> >missing in nodelist 'a'. An altered node means the Findnumber Attribute
> >value is there in bothe nodelists, but the Attribute nodes or the
> >object/Attribute nodes are different. I think a simple text compare would
> >be
> >enough for the test of alternation.
> >
> >Regards,
> >Kai
> >
> 
> _________________________________________________________________
> Dont just search. Find. Check out the new MSN Search! 
> http://search.msn.click-url.com/go/onm00200636ave/direct/01/

Current Thread