Re: [xsl] Testing 2 XML documents for equality - a solution

Subject: Re: [xsl] Testing 2 XML documents for equality - a solution
From: Mukul Gandhi <mukul_gandhi@xxxxxxxxx>
Date: Mon, 4 Apr 2005 09:19:15 -0700 (PDT)
--- David Carlisle <davidc@xxxxxxxxx> wrote:
> 
> For the vast majority of nodes this is still a) very
> expensive way of
> comparing them and b) doesn't help with the
> comparison.

I agree ! I understand that generating the string hash
of the entire XML document is a expensive operation..
If I reflect deeply, I would imagine that even if 2
XML documents are different, they may generate same
concatenated string representation.. So my algorithm
will probably fail in some cases. But I have no proof
of my this new view. The XML examples with which I
worked over my stylesheet, gave right answer as I
expected. I'll test more to see if it shall fail for
some cases..

> For a given element node if you calculate an XPath
> to the current node,
> and then use that XPath to find a node in the other
> document, you have
> two nodes, you then need to compare whether they are
> equal, but that is
> _exactly_ the problem you are trying to solve. The
> earlier stylesheet
> just took the string value of the node but that is
> just the
> concatenation of all the element content so loses
> most of the markup
> information. 

I think you are right! (as always :) )

> What is wrong with the much simpler alternative of
> just writing out the
> string corresponding to a specific "canonical"
> linearisation, and then
> jsut comparing those two strings?

I think I should explore this option. But I believe
that converting a XML document to canonical form is
not a trivial task. For e.g. we need to convert
documents to UTF-8 . i.e. if XML document has encoding
ISO-8859-1 , then its canonical representation will
have UTF-8 encoding .. (this I think cannot be easily
accomplished with XSLT; infact I think it is
impossible with XSLT?) . I think, there are also other
canonicalization conversion rules which cannot be
easily done with XSLT. 

I think by using a SAX parser, it is probably easier
to convert XML to canonical form (ofcourse one must
know all the rules as well!)..

Regards,
Mukul

> David
> 
>
________________________________________________________________________
> This e-mail has been scanned for all viruses by
> Star. The
> service is powered by MessageLabs. For more
> information on a proactive
> anti-virus service working around the clock, around
> the globe, visit:
> http://www.star.net.uk
>
________________________________________________________________________
> 
> 


		
__________________________________ 
Yahoo! Messenger 
Show us what our next emoticon should look like. Join the fun. 
http://www.advision.webevents.yahoo.com/emoticontest

Current Thread