Subject: Re: [xsl] comparing nodesets to each other From: "Kai Hackemesser" <kaha@xxxxxx> Date: Mon, 11 Apr 2005 21:46:25 +0200 (MEST) |
Hello, Aron, I try to be more exact in my definition: - two nodes 'relation' are different, if they have the same value in relation/Attribute[@Name='FindNumber']/Value but the text value of both node's children at all is different. - a 'relation' node must be listed, too, if there is no corresponding 'relation' node with same relation/Attribute[@Name='FindNumber']/Value - I need to know in which list a node is changed/added/removed. - The whole list of changes needs to be sorted by the Attribute[@Name='FindNumber']/Value Regards, Kai > Kai, > > IMO the general problem of finding the differences between any 2 XML > documents is, shall we say, challenging. Something that helps such an > operation is being extremely precise about what constitutes a difference, > and being able to formulate precedence rules in comparision operations. > An > earlier respondent illustrated the need for this with an example that > "added" a node in the second document. It's very likely *you* have a good > idea of what you're after, but in these types of problems you'll get the > most help if you can express your "rules for comparision" in [formal] > written form. > > Consider the following documents: > > doc1.xml > ======= > <doc> > <chapter n="1"/> > <chapter n="2"/> > </doc> > > doc2.xml > ======= > <doc> > <chapter n="1"/> > <chapter n="2"> > <para n="1"/> > </chapter> > </doc> > > What *exactly* would you like in your final output? Do you want to see > only > the node <para n="1"/>? Do you want to see <para n="1"/> and all its > parent > nodes? You see where this is going? It helps to be precise. > > Also, while writing a "general" differencing algorithm would be > worthwhile, > it's probably not simple. To start you'll have better luck if you > constrain > your problem, as it relates to your domain. One way to do this is by > identifying a least granular level for your purposes--perhaps a node or > "level" below which identifying differences is superfluous. In the > example > above, you could say: > > --chapter nodes are compared by their "n" attribute > --if there are any differences betweein 2 <chapter> nodes or any of their > descendents, the entire <chapter> node is considered "changed", and that > of > doc2.xml is output > > I've done this type of "constrained" comparision with success. > > Here's another approach to consider: preprocess each xml document to a > "standard" format, then use a textual diff tool. The idea here is that > you > apply an XSL transform to doc1.xml so that <chapter> nodes are sequential, > their descendents are ordered is a specific way, etc. Do the same with > doc2.xml. Then use a diff tool ( eg: beyondcompare, from > http://www.scootersoftware.com/ ) to check differences. Note, this method > is susceptible to line-breaks, so it's not trivial to implement. > > Regards > > --A > > > > >From: "Kai Hackemesser" <kaha@xxxxxx> > >Reply-To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > >To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > >Subject: Re: [xsl] comparing nodesets to each other > >Date: Mon, 11 Apr 2005 18:18:47 +0200 (MEST) > > > >Hello, David, > > > >Thanks for the response. The errors you mentioned already have happened, > >that's why I'm currently clueless how to solve it. > > > >I try to show the structure of the recipe (eased): > > > ><object> > > <relation> > > <Attribute Type="string" Name="FindNumber"> > > <Value><![CDATA[0005]]></Value> > > <Attribute> > > <Attribute Type="float" Name="... > > <object> > > <Attribute Type="string" Name="PartNumber"> > > <Value><![CDATA[Part1]]></Value> > > </Attribute> > > </object> > > </relation> > > <relation> > > <Attribute Type="string" Name="FindNumber"> > > <Value><![CDATA[0010]]></Value> > > <Attribute> > > <Attribute Type="float" Name="... > > <object> > > <Attribute Type="string" Name="PartNumber"> > > <Value><![CDATA[Part2]]></Value> > > </Attribute> > > </object> > > </relation> > > <relation> > > <Attribute Type="string" Name="FindNumber"> > > <Value><![CDATA[0015]]></Value> > > <Attribute> > > <Attribute Type="float" Name="... > > <object> > > <Attribute Type="string" Name="PartNumber"> > > <Value><![CDATA[Part3]]></Value> > > </Attribute> > > </object> > > </relation> > ></object> > > > >needs to be compared against a similar structure: > ><object> > > <relation> > > <Attribute Type="string" Name="FindNumber"> > > <Value><![CDATA[0005]]></Value> > > <Attribute> > > <Attribute Type="float" Name="... > > <object> > > <Attribute Type="string" Name="PartNumber"> > > <Value><![CDATA[Part1]]></Value> > > </Attribute> > > </object> > > </relation> > > <relation> > > <Attribute Type="string" Name="FindNumber"> > > <Value><![CDATA[0015]]></Value> > > <Attribute> > > <Attribute Type="float" Name="... > > <object> > > <Attribute Type="string" Name="PartNumber"> > > <Value><![CDATA[Part3b]]></Value> > > </Attribute> > > </object> > > </relation> > ></object> > > > >(Attribute nodes are more than one per object or relation node) > > > >So I need to extract all differences like attribute change, missing > nodes, > >altered nodes, added nodes. To identify a node I use the findnumber > >Attribute node of each relation node. A missing node is one, where the > >corresponding Findnumber Attribute value is missing in nodelist 'b'. An > >added node is one where the corresponding Findnumber Attribute value is > >missing in nodelist 'a'. An altered node means the Findnumber Attribute > >value is there in bothe nodelists, but the Attribute nodes or the > >object/Attribute nodes are different. I think a simple text compare would > >be > >enough for the test of alternation. > > > >Regards, > >Kai > > > > _________________________________________________________________ > Dont just search. Find. Check out the new MSN Search! > http://search.msn.click-url.com/go/onm00200636ave/direct/01/
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] comparing nodesets to eac, Aron Bock | Thread | Re: [xsl] comparing nodesets to eac, Aron Bock |
Re: [xsl] comparing nodesets to eac, Aron Bock | Date | [xsl] Extra newline remove on text , Brendan Benke |
Month |