[xsl] Locating arbitrary duplicate structure

Subject: [xsl] Locating arbitrary duplicate structure
From: "Daniel Bowen" <dbowen2@xxxxxxxxx>
Date: Wed, 10 Jan 2001 12:28:50 -0700
(Thanks for the replies on the ends-with, and variables/parameters in match
of xsl:key)

Here's another issue that I'm facing.  I'm hoping for some input on possible
approaches I can take.


Let's say I have XML with a well defined schema, but arbitrary hierarchy.
Using XSLT, I need to identify branches that are identical, or that differ
by a set number of attributes. I don't mind if the approach depends on
extension script.

As a simple example (although it doesn't necessarily demonstrate the
arbitrary hierarchy), let's say I have the XML:

 <LinearFeatureModel name="Light Poles">
  <Composite name="Composite">
   <OffsetPath name="Offset to the Left" offset="-3.6">
    <RegPopLinear name="Regularly place the poles" spacing="6">
     <Point name="Pole" relative="1" model="pole.flt" />
    </RegPopLinear>
   </OffsetPath>
   <OffsetPath name="Offset to the Right" offset="3.6">
    <RegPopLinear name="Regularly place the poles" spacing="6">
     <Point name="Pole" relative="1" model="pole.flt" />
    </RegPopLinear>
   </OffsetPath>
  </Composite>
 </LinearFeatureModel>

The branch (we'll call it 'branch 1')
      <Point name="Pier Pole" relative="1" model="pierpole.flt" />
is found twice.

However, 'branch 2':
     <RegPopLinear name="Regularly place the poles" spacing="6">
      <Point name="Pier Pole" relative="1" model="pierpole.flt" />
     </RegPopLinear>
is also found exactly twice, and includes branch 1.

The branch starting with the "OffsetPath" nodes is very similar in both
cases, but differs by both the "name" attribute and the "offset" attribute.
I'll call the first OffsetPath branch 3, and the second branch 4.

I'd like to be able to detect:
* branch 2 is repeated twice
* branch 1 is a sub-part of branch 2
* that branch 3 and 4 are similar, and differ by 2 attributes or attribute
values.


There is already a first cut of a solution in place that I'm trying to
replace (done by someone else :-) ).  It only recognizes branches that are
exact duplicates, and does not recognize if a sub branch is in a higher
branch that includes it (where the higher branch is also identical in all
cases).  It is also extremely inefficient (its at least O(n^2) if not
worse).  The approach essentially has a nested loop, and compares the entire
XML stringized representation of each node (with all its descendants) with
every other node (and their descendants).

What are some other approaches that I could take?  Thanks!

-Daniel


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread