Re: [xsl] Testing 2 XML documents for equality - a solution

Subject: Re: [xsl] Testing 2 XML documents for equality - a solution
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Thu, 31 Mar 2005 17:21:48 -0500
At 02:51 PM 3/31/2005, you wrote:
> >  Any activity to
> >solve an undefined problem is groundless and imaginary -- something
> >like hallucination.
>
> I can accept this provisionally, leaving aside a discussion of what we
> should mean by "groundless", "imaginary", and "hallucination". (Though
> perhaps not on XSL-List. :-) Whether any such activity is a waste of time
> is, however, another question.

I cannot see where I said it was waste of time ?

You didn't -- which is a good thing, since any number of the activities we engage in (such as helping people on XSL-List) might fall into the category of dealing with a problem not fully defined. :->


And it's still very good to reflect on how futile it generally is to approach a programming problem like this.

The main problem in this thread is to define the problem.

Quite so.


BTW, two threads are converging here ... Karl is asking about double-tree-traversal, while Mukul is thinking about how to compare two trees.

FWIW (Mukul I hope you're still reading :-), I once developed a moderately useful "document comparison" routine, but it was for a known, constrained document type, in which changes only in certain elements were of interest. This helped, since I could focus my string-comparisons on recognized subtrees, whose correspondence was determinable. Nonetheless the routine was limited to saying where their were apparent differences, not precisely what they were: this wasn't anything like an industrial-strength "diff".

Even apart from the datatype-related problems Mike cited (knowing that the number "1.0" is the same as the number "1"), sticking just to string comparisons, this is a hard problem. For example, when two paragraphs are transposed

<div>
  <p>By 1997, best practices in the SGML industry were well understood....</p>
  <p>Work on XML started in earnest in 1997...</p>
</div>

<div>
  <p>Work on XML started in earnest in 1997...</p>
  <p>By 1997, best practices in the SGML industry were well understood....</p>
</div>

If you can establish that both divs are "supposed to be" the same, it's not hard to find that they're different: test="string($div1) = string($div2)".

The same thing is rather harder at the level of the p. Which one is supposed to be which? Have they both changed, or neither?

This problem requires defining both what it means to be equal, and how you know that two nodes (elements, strings, tokens, information items, what have you) correspond and are to be compared to one another.

Cheers,
Wendell


====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================

Current Thread