RE: [xsl] Testing 2 XML documents for equality - a solution

Subject: RE: [xsl] Testing 2 XML documents for equality - a solution
From: Mukul Gandhi <mukul_gandhi@xxxxxxxxx>
Date: Sun, 3 Apr 2005 09:43:20 -0700 (PDT)
Hi Mike (and all),
  I have attempted to define the problem (of comparing
2 XML documents). Its in a pdf form. Any one may
access the file from location
http://gandhimukul.tripod.com/comparing_xml_documents.pdf
(size appx 198 KB).

Some spelling and grammer errors are expected. It is
unintentional. Any disrespect reflecting from such
errors is unintentional.

I have kept various messages of this thread intact
below, so that it may help to understand the backgound
of the problem, if somebody wishes to know.

I have not done XSLT modifications to my earlier XSLT
I posted. I'll do so after recieving feedback on this
work..

All suggestions, debates and corrections are welcome..

I am keeping my fingers crossed.

Regards,
Mukul

--- Michael Kay <mike@xxxxxxxxxxxx> wrote:

> You're still struggling a bit.
> 
> Let's start with requirements. What is this for?
> This is part of the
> difficulty: there are many reasons for wanting to
> compare two XML documents,
> and the different requirements don't necessarily
> lead to the same
> specification. If you describe some use cases this
> will help you on the way.
> For example, it will tell you whether it's enough to
> give a boolean answer,
> or whether you need to pinpoint where the two trees
> differ.
> 
> The next step is specification. This doesn't have to
> be mathematical, but it
> does have to be rigorous. Specifying it in terms of
> a comparison of two
> drawings of the trees being alike isn't going to be
> helpful. I know what
> you're getting at: you're trying to say that there's
> a one-to-one
> correspondence between the nodes and arcs in one
> tree and the nodes and arcs
> in the other. But you haven't said which properties
> of the nodes are
> important (namespace prefix? base uri? type
> annotation?), you haven't said
> how you will compare values (string comparison, with
> or without Unicode
> normalization? Collations? typed value comparison?),
> and you haven't said
> how you will handle the significance of ordering.
> 
> Finally, implementation (which is where you
> started). Before you embark on
> an implementation you should have an idea of the use
> cases (see above) and
> their performance requirements. For example, is the
> algorithm to be
> optimized for comparing trees that are probably the
> same or very similar, or
> for comparing trees that are likely to be wildly
> different?
> 
> Sorry if this is a bit severe: but you did ask for
> help. 
> 
> Michael Kay
> http://www.saxonica.com/
> 
> 
> 
> > -----Original Message-----
> > From: Mukul Gandhi [mailto:mukul_gandhi@xxxxxxxxx]
> 
> > Sent: 31 March 2005 22:49
> > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> > Subject: Re: [xsl] Testing 2 XML documents for
> equality - a solution
> > 
> > Hi Dimitre,
> >   Below is the "scope" of my solution. My
> definition
> > of equality of XML documents consists of 2 parts:
> > 
> > Part 1) Node types, to which the stylesheet does
> > comparison
> > -------
> > "XPath 1.0" trees define 7 kinds of nodes. These
> are
> > listed below. I have marked yes or no against node
> > types, indicating whether my stylesheet has logic
> to
> > compare these nodes. If XML documents have nodes
> of
> > kind which are marked "no", then my stylesheet may
> > give wrong result(I have not done any testing for
> no
> > marked nodes)..
> > 
> > root nodes - yes
> > element nodes - yes
> > text nodes - yes
> > attribute nodes - yes
> > namespace nodes - no
> > processing instruction nodes - no
> > comment nodes - no
> > 
> > Part 2) My notion of equality of 2 XML documents
> > -------
> > Imagine that the XPath tree of 2 documents are
> *drawn
> > on paper*. The diagram is just similar to the
> XPath
> > tree diagram in Mike's book (XSLT 2nd Edition,
> > Programmer's Reference) page 57(section "The Tree
> > Model"). 
> > 
> > If XPath tree of 2 XML documents will "look same"
> on
> > paper (as in Mike's book's page 57), the documents
> > will be considered equal by my stylesheet. 
> > 
> > The scope of my stylesheet presently covers only
> these
> > 2 points.
> > 
> > I don't claim any other capability from my
> stylesheet.
> > 
> > I have not attempted to equate the XML documents
> in
> > terms of mathematical terms (like relations as you
> > mentioned; the subject I don't understand well) or
> > canonical terms(as defined by the canonical XML
> spec).
> > 
> > So considering the above scope of my work, can my
> > stylesheet be evaluated for correctness? 
> > 
> > I have deep regard for people who participated on
> this
> > thread.. They surely have deep knowledge of the
> > subject.
> > 
> > Regards,
> > Mukul
> > 
> > --- Dimitre Novatchev <dnovatchev@xxxxxxxxx>
> wrote:
> > > Hi Mukul,
> > > 
> > > 
> > > On Thu, 31 Mar 2005 04:36:32 -0800 (PST), Mukul
> > > Gandhi
> > > <mukul_gandhi@xxxxxxxxx> wrote:
> > > > Hi Dimitre,
> > > >  I am really not good at mathematics at this
> > > level. I
> > > > did studied about relations like "symmetric,
> > > reflexive
> > > > and transitive" time back. But I did so just
> to
> > > score
> > > > grades. I had no idea then their practical
> use..
> > > It is
> > > > indeed enlightening for me to know they have
> real
> > > > practical use (in XML & XSLT!). I cannot
> define my
> > > > problem in these terms.. As my knowledge is
> > > limited.
> > > 
> > > This confirms the conclusion that here we see
> > > attempts at offering a
> > > solution to a problem that is not well defined.
> > > 
> > > How can we then judge the solution? 
> > > 
> > > > 
> > > > I would be happy if you can define in these
> > > precise
> > > > terms the problem I am trying to solve(based
> on my
> > > > earlier posts to this thread).
> > > 
> > > Impossible.
> > > 
> > > >  I'll keep it as a
> > > > reference for future use. I defined the
> problem (I
> > > am
> > > > trying to solve) from an average programmer's
> > > point of
> > > > view.. And I think that it is quite
> understandable
> > > to
> > > > an average programmer ;)
> > > 
> > > A number of very wise people already explained
> why
> > > this is difficult
> > > to define -- they also found holes in your
> > > definition (and
> > > understanding) of the problem. These people
> > > obviously are not average
> > > programmers.
> > > 
> > > Cheers,
> > > Dimitre Novatchev.
> > > 
> > > 
> > 
> > 
> > 		
> > __________________________________ 
> > Yahoo! Messenger 
> > Show us what our next emoticon should look like.
> Join the fun. 
> >
> http://www.advision.webevents.yahoo.com/emoticontest
> 
> 



		
__________________________________ 
Do you Yahoo!? 
Take Yahoo! Mail with you! Get it on your mobile phone. 
http://mobile.yahoo.com/maildemo 

Current Thread