RE: [xsl] Testing 2 XML documents for equality - a solution

Subject: RE: [xsl] Testing 2 XML documents for equality - a solution
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Thu, 31 Mar 2005 23:09:18 +0100
You're still struggling a bit.

Let's start with requirements. What is this for? This is part of the
difficulty: there are many reasons for wanting to compare two XML documents,
and the different requirements don't necessarily lead to the same
specification. If you describe some use cases this will help you on the way.
For example, it will tell you whether it's enough to give a boolean answer,
or whether you need to pinpoint where the two trees differ.

The next step is specification. This doesn't have to be mathematical, but it
does have to be rigorous. Specifying it in terms of a comparison of two
drawings of the trees being alike isn't going to be helpful. I know what
you're getting at: you're trying to say that there's a one-to-one
correspondence between the nodes and arcs in one tree and the nodes and arcs
in the other. But you haven't said which properties of the nodes are
important (namespace prefix? base uri? type annotation?), you haven't said
how you will compare values (string comparison, with or without Unicode
normalization? Collations? typed value comparison?), and you haven't said
how you will handle the significance of ordering.

Finally, implementation (which is where you started). Before you embark on
an implementation you should have an idea of the use cases (see above) and
their performance requirements. For example, is the algorithm to be
optimized for comparing trees that are probably the same or very similar, or
for comparing trees that are likely to be wildly different?

Sorry if this is a bit severe: but you did ask for help. 

Michael Kay
http://www.saxonica.com/



> -----Original Message-----
> From: Mukul Gandhi [mailto:mukul_gandhi@xxxxxxxxx] 
> Sent: 31 March 2005 22:49
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: [xsl] Testing 2 XML documents for equality - a solution
> 
> Hi Dimitre,
>   Below is the "scope" of my solution. My definition
> of equality of XML documents consists of 2 parts:
> 
> Part 1) Node types, to which the stylesheet does
> comparison
> -------
> "XPath 1.0" trees define 7 kinds of nodes. These are
> listed below. I have marked yes or no against node
> types, indicating whether my stylesheet has logic to
> compare these nodes. If XML documents have nodes of
> kind which are marked "no", then my stylesheet may
> give wrong result(I have not done any testing for no
> marked nodes)..
> 
> root nodes - yes
> element nodes - yes
> text nodes - yes
> attribute nodes - yes
> namespace nodes - no
> processing instruction nodes - no
> comment nodes - no
> 
> Part 2) My notion of equality of 2 XML documents
> -------
> Imagine that the XPath tree of 2 documents are *drawn
> on paper*. The diagram is just similar to the XPath
> tree diagram in Mike's book (XSLT 2nd Edition,
> Programmer's Reference) page 57(section "The Tree
> Model"). 
> 
> If XPath tree of 2 XML documents will "look same" on
> paper (as in Mike's book's page 57), the documents
> will be considered equal by my stylesheet. 
> 
> The scope of my stylesheet presently covers only these
> 2 points.
> 
> I don't claim any other capability from my stylesheet.
> 
> I have not attempted to equate the XML documents in
> terms of mathematical terms (like relations as you
> mentioned; the subject I don't understand well) or
> canonical terms(as defined by the canonical XML spec).
> 
> So considering the above scope of my work, can my
> stylesheet be evaluated for correctness? 
> 
> I have deep regard for people who participated on this
> thread.. They surely have deep knowledge of the
> subject.
> 
> Regards,
> Mukul
> 
> --- Dimitre Novatchev <dnovatchev@xxxxxxxxx> wrote:
> > Hi Mukul,
> > 
> > 
> > On Thu, 31 Mar 2005 04:36:32 -0800 (PST), Mukul
> > Gandhi
> > <mukul_gandhi@xxxxxxxxx> wrote:
> > > Hi Dimitre,
> > >  I am really not good at mathematics at this
> > level. I
> > > did studied about relations like "symmetric,
> > reflexive
> > > and transitive" time back. But I did so just to
> > score
> > > grades. I had no idea then their practical use..
> > It is
> > > indeed enlightening for me to know they have real
> > > practical use (in XML & XSLT!). I cannot define my
> > > problem in these terms.. As my knowledge is
> > limited.
> > 
> > This confirms the conclusion that here we see
> > attempts at offering a
> > solution to a problem that is not well defined.
> > 
> > How can we then judge the solution? 
> > 
> > > 
> > > I would be happy if you can define in these
> > precise
> > > terms the problem I am trying to solve(based on my
> > > earlier posts to this thread).
> > 
> > Impossible.
> > 
> > >  I'll keep it as a
> > > reference for future use. I defined the problem (I
> > am
> > > trying to solve) from an average programmer's
> > point of
> > > view.. And I think that it is quite understandable
> > to
> > > an average programmer ;)
> > 
> > A number of very wise people already explained why
> > this is difficult
> > to define -- they also found holes in your
> > definition (and
> > understanding) of the problem. These people
> > obviously are not average
> > programmers.
> > 
> > Cheers,
> > Dimitre Novatchev.
> > 
> > 
> 
> 
> 		
> __________________________________ 
> Yahoo! Messenger 
> Show us what our next emoticon should look like. Join the fun. 
> http://www.advision.webevents.yahoo.com/emoticontest

Current Thread