Re: [xsl] Comparing nodes minus one child

Subject: Re: [xsl] Comparing nodes minus one child
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Fri, 28 Sep 2001 14:44:36 -0400

It seems to me you have two separable problems here. One is establishing how you know that one <A> "equals" another. This is what you first ran up against: it's an area in XSLT where care must be taken, because XSLT doesn't really compare node sets as such, but reduces their comparison to string comparisons (saying they're equal if they contain nodes whose string value is equal). Given a process whereby you build a Result Tree Fragment from each <A> which represents what it "is" for purposes of your comparison (for example, that you don't pick up <X> children), you can compare RTFs. Since they really compare only their string values, there are potentially problems due to the fact that node hierarchies that are differently structured, can be equal when considered as strings. (Mike warned about this.) The mode-based routine I suggested to build RTFs gave a way to work around that.

That's problem #1, which I'm assuming you've got licked. Problem #2 is how to expedite the actual comparison efficiently. You are quite right to suspect it's a cumbersome operation, since it seems that for each <A> in one group, you need to go through all the <A>s in the other group.

But maybe there is a way to optimize this just a bit, since you actually don't need to go through *all* the <A>s in the second group for each <A> in the first -- since your action, if you get a hit, is just to copy, and presumably you only need to do this once. Therefore you only need to go until you either (1) get a match, or (2) run out of <A>s in the second group to compare.

This suggests that a recursive solution is possible. Define your comparison variable RTF, then pick up the first <A> in the second group to compare; if it compares truly, copy it and quit. If not and there's a next <A> in the second group, pick it up and repeat; if there's not, you're done.

You haven't provided enough code infrastructure for me to be able to code this up properly: but that's okay, since I don't really have time to do that anyway. But there are some experts in recursive processing on this list who could certainly implement this (and even come up with more efficient alternatives).

Another thing to consider would be a two-pass solution: that could be quicker too. The first pass would reduce both your sets of <A>s to their comparison version, so you could compare them directly, and not have to reduce the second set multiple times (that is, each time you have a new <A> from the first set to compare). The second pass would do the comparison, and when there's a hit, go back to the first source document (using the document() function probably) to get the entire <A>. This can also be done in a single stylesheet if you are willing to use a node-set extension function.

I hope I've understood your problem well enough to suggest ways to go about it.


At 12:36 PM 9/28/01, you wrote:
Assume that I have two collections of elements <A>(stored in variables)
that I have gotten from two different files.
The collections look something like the following

I want to see if each of the <A> elements in the <HOLDER>(from first
collection) equals an <A> in the <HOLDER>(from the second collection), but
I want the comparison to only include the <A> with all of its children
EXCEPT the <X> element.
My goal is to output the entire <A> element (including the <X> element).

I can get rid of the <X> elements by calling the following templates:
<xsl:template match="node()|@*" mode="remove">
          <xsl:apply-templates select="node()|@*" mode="remove"/>
<xsl:template match="X"mode="remove">
     <!-- do nothing, we don't want this element in the result -->

I know that I can convert each set of <A> elements in both collections to
the compressed version.  Then I would have to iterate through all of the
original <A> elements, compressing that individually, and then comparing it
to the results of the compress second collection. (I also need to do it
going through the second list as well, since I am doing Adds, Deletes,
etc.).  Is there a way to do this more effiently?

Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.      
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
  Mulberry Technologies: A Consultancy Specializing in SGML and XML

XSL-List info and archive:

Current Thread