RE: [xsl] Testing 2 XML documents for equality - a solution

Subject: RE: [xsl] Testing 2 XML documents for equality - a solution
From: Mukul Gandhi <mukul_gandhi@xxxxxxxxx>
Date: Fri, 1 Apr 2005 08:47:18 -0800 (PST)
Hi Mike,
  Thanks for your response. Sorry for a bit late to
reply as I was not sure how should I frame my this
reply. Now handling your and Dimitre's comments is
getting difficult for me!

I am taking up defining my problem of "equating 2 XML
documents" as a seperate assignment! I'll define my
problem first, and then do the XSLT coding for my
"equivalence definition". 

I think there will be 3 aspects of the definition:

1) A equivalence defintion of trees from a pure
computer science point of view. Probably in terms of
sets and relations.

2) A equivalence defintion in term of "XPath trees".
XPath nodes have characteristics like namespace
prefix, base uri, type annotation etc as you said
below. 
I'll try to map definitions 2) with 3) and generate a
mixture:)

3) A equivalence definition from physical storage
point of view. I'll try to do this. But I am not sure
whether I'll succeed. I am not sure whether different
OS store text files(which will be XML documents) in
different ways. Can I compare a byte stream on Unix
with a byte stream on Windows? And can I define
equivalence at hardware level i.e. storage in memory
locations(-:)) ?

Studying 1) and 2) will be my priority. 

I'll do this work at my own pace.. I'll get back to
the list when I am ready!

Regards,
Mukul

--- Michael Kay <mike@xxxxxxxxxxxx> wrote:

> You're still struggling a bit.
> 
> Let's start with requirements. What is this for?
> This is part of the
> difficulty: there are many reasons for wanting to
> compare two XML documents,
> and the different requirements don't necessarily
> lead to the same
> specification. If you describe some use cases this
> will help you on the way.
> For example, it will tell you whether it's enough to
> give a boolean answer,
> or whether you need to pinpoint where the two trees
> differ.
> 
> The next step is specification. This doesn't have to
> be mathematical, but it
> does have to be rigorous. Specifying it in terms of
> a comparison of two
> drawings of the trees being alike isn't going to be
> helpful. I know what
> you're getting at: you're trying to say that there's
> a one-to-one
> correspondence between the nodes and arcs in one
> tree and the nodes and arcs
> in the other. But you haven't said which properties
> of the nodes are
> important (namespace prefix? base uri? type
> annotation?), you haven't said
> how you will compare values (string comparison, with
> or without Unicode
> normalization? Collations? typed value comparison?),
> and you haven't said
> how you will handle the significance of ordering.
> 
> Finally, implementation (which is where you
> started). Before you embark on
> an implementation you should have an idea of the use
> cases (see above) and
> their performance requirements. For example, is the
> algorithm to be
> optimized for comparing trees that are probably the
> same or very similar, or
> for comparing trees that are likely to be wildly
> different?
> 
> Sorry if this is a bit severe: but you did ask for
> help. 
> 
> Michael Kay
> http://www.saxonica.com/
> 
> 
> 
> > -----Original Message-----
> > From: Mukul Gandhi [mailto:mukul_gandhi@xxxxxxxxx]
> 
> > Sent: 31 March 2005 22:49
> > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> > Subject: Re: [xsl] Testing 2 XML documents for
> equality - a solution
> > 
> > Hi Dimitre,
> >   Below is the "scope" of my solution. My
> definition
> > of equality of XML documents consists of 2 parts:
> > 
> > Part 1) Node types, to which the stylesheet does
> > comparison
> > -------
> > "XPath 1.0" trees define 7 kinds of nodes. These
> are
> > listed below. I have marked yes or no against node
> > types, indicating whether my stylesheet has logic
> to
> > compare these nodes. If XML documents have nodes
> of
> > kind which are marked "no", then my stylesheet may
> > give wrong result(I have not done any testing for
> no
> > marked nodes)..
> > 
> > root nodes - yes
> > element nodes - yes
> > text nodes - yes
> > attribute nodes - yes
> > namespace nodes - no
> > processing instruction nodes - no
> > comment nodes - no
> > 
> > Part 2) My notion of equality of 2 XML documents
> > -------
> > Imagine that the XPath tree of 2 documents are
> *drawn
> > on paper*. The diagram is just similar to the
> XPath
> > tree diagram in Mike's book (XSLT 2nd Edition,
> > Programmer's Reference) page 57(section "The Tree
> > Model"). 
> > 
> > If XPath tree of 2 XML documents will "look same"
> on
> > paper (as in Mike's book's page 57), the documents
> > will be considered equal by my stylesheet. 
> > 
> > The scope of my stylesheet presently covers only
> these
> > 2 points.
> > 
> > I don't claim any other capability from my
> stylesheet.
> > 
> > I have not attempted to equate the XML documents
> in
> > terms of mathematical terms (like relations as you
> > mentioned; the subject I don't understand well) or
> > canonical terms(as defined by the canonical XML
> spec).
> > 
> > So considering the above scope of my work, can my
> > stylesheet be evaluated for correctness? 
> > 
> > I have deep regard for people who participated on
> this
> > thread.. They surely have deep knowledge of the
> > subject.
> > 
> > Regards,
> > Mukul
> > 
> > --- Dimitre Novatchev <dnovatchev@xxxxxxxxx>
> wrote:
> > > Hi Mukul,
> > > 
> > > 
> > > On Thu, 31 Mar 2005 04:36:32 -0800 (PST), Mukul
> > > Gandhi
> > > <mukul_gandhi@xxxxxxxxx> wrote:
> > > > Hi Dimitre,
> > > >  I am really not good at mathematics at this
> > > level. I
> > > > did studied about relations like "symmetric,
> > > reflexive
> > > > and transitive" time back. But I did so just
> to
> > > score
> > > > grades. I had no idea then their practical
> use..
> > > It is
> > > > indeed enlightening for me to know they have
> real
> > > > practical use (in XML & XSLT!). I cannot
> define my
> > > > problem in these terms.. As my knowledge is
> > > limited.
> > > 
> > > This confirms the conclusion that here we see
> > > attempts at offering a
> > > solution to a problem that is not well defined.
> > > 
> > > How can we then judge the solution? 
> > > 
> > > > 
> > > > I would be happy if you can define in these
> > > precise
> > > > terms the problem I am trying to solve(based
> on my
> > > > earlier posts to this thread).
> > > 
> > > Impossible.
> > > 
> > > >  I'll keep it as a
> > > > reference for future use. I defined the
> problem (I
> > > am
> > > > trying to solve) from an average programmer's
> > > point of
> > > > view.. And I think that it is quite
> understandable
> > > to
> > > > an average programmer ;)
> > > 
> > > A number of very wise people already explained
> why
> > > this is difficult
> > > to define -- they also found holes in your
> > > definition (and
> > > understanding) of the problem. These people
> > > obviously are not average
> > > programmers.
> > > 
> > > Cheers,
> > > Dimitre Novatchev.
> > > 
> > > 
> > 
> > 
> > 		
> > __________________________________ 
> > Yahoo! Messenger 
> > Show us what our next emoticon should look like.
> Join the fun. 
> >
> http://www.advision.webevents.yahoo.com/emoticontest
> 
> 



		
__________________________________ 
Yahoo! Messenger 
Show us what our next emoticon should look like. Join the fun. 
http://www.advision.webevents.yahoo.com/emoticontest

Current Thread