RE: [xsl] Spotting "cousin marriages" in a tree

Subject: RE: [xsl] Spotting "cousin marriages" in a tree
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Wed, 28 Jul 2004 23:57:33 +0100
You may be hoping for too much. Graph algorithms such as looking for cycles
often have complexity of O(n^2) or worse, whatever language they are
implemented in.

Michael Kay 

> -----Original Message-----
> From: Phil Endecott [mailto:spam_from_xslt_list@xxxxxxxxxxxx] 
> Sent: 28 July 2004 22:19
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] Spotting "cousin marriages" in a tree
> 
> Dear XSLT experts,
> 
> I have a flat input like this:
> 
> <things>
>    <thing id="1"> <child idref="2"/> <child idref="3"/> </thing>
>    <thing id="2"/>
>    <thing id="3"/>
> </things>
> 
> I want to match up the ids with the idrefs to build a tree that looks 
> like this:
> 
> <tree id="1">
>    <tree id="2"/>
>    <tree id="3"/>
> </tree>
> 
> I could do it like this:
> 
> <xsl:template match="thing">
>    <tree>
>      <xsl:for-each select="child">
>        <xsl:apply-templates
>             select="/things/thing[@id=current()/@idref]"/>
>      </xsl:for-each>
>    </tree>
> </xsl:template>
> 
> But that is O(n^2), so I'd instead do it like this:
> 
> <xsl:key name="things-by-id" match="thing" use="@id"/>
> <xsl:template match="thing">
>    <tree>
>      <xsl:for-each select="child">
>        <xsl:apply-templates select="key('things-by-id',@idref)"/>
>      </xsl:for-each>
>    </tree>
> </xsl:template>
> 
> which is O(n log n).
> 
> This is all fine.  But very rarely my input may not describe a pure 
> tree; it may contain "cousin marriages" or more "incestuous" 
> relationships like this:
> 
> <things>
>    <thing id="1"> <child idref="2"/> <child idref="3"/> </thing>
>    <thing id="2"> <child idref="4"/> </thing>
>    <thing id="3"> <child idref="4"/> </thing>
>    <thing id="4"/>
> </things>
> 
> If you apply either of the above templates to this, thing 4 
> appears in 
> the output twice:
> 
> <tree id="1">
>    <tree id="2">
>      <tree id="4"/>
>    </tree>
>    <tree id="3">
>      <tree id="4">
>    </tree>
> </tree>
> 
> which is very bad in my application.  What I need is 
> something like this:
> 
> <tree id="1">
>    <tree id="2">
>      <tree id="4"/>
>    </tree>
>    <tree id="3">
>      <error/>
>    </tree>
> </tree>
> 
> I can't afford for this to become an O(n^2) operation.
> 
> The obvious thought is that I need to check at the point of recursion 
> whether the thing I am about to process has already been output.  But 
> there is no "preceding-in-the-output" axis or other way I can 
> think of 
> to note which things have already been output.
> 
> An alternative is to apply a test to the other input elements 
> to see if 
> other things have the same thing as this one as a child, but 
> that fails 
> because my input may legitimately contain redundant nodes like this:
> 
> <things>
>    <thing id="1"> <child idref="2"/> <child idref="3"/> </thing>
>    <thing id="2"/>
>    <thing id="3"/>
>    <thing id="9"> <child idref="2"/> </thing>
> </things>
> 
> In this case I want the same output as the very first example, i.e. 
> thing 9 doesn't appear in the output at all.
> 
> I can do a multi-pass process using exsl:node-set(), 
> generating the tree 
> with the duplicates in it and then deleting them if they are 
> duplicates, 
> i.e. something like this:
> 
> <xsl:template match="/">
>    <xsl:variable name="n">
>      <xsl:apply-templates select="things/thing[1]"/>  (as above)
>    </xsl:variable>
>    <xsl:apply-templates select="exsl:node-set($n)" 
> mode="remove-dupes"/>
> </xsl:template>
> 
> <xsl:template match="tree" mode="remove-dupes">
>    <xsl:choose>
>      <xsl:when test="preceding::tree[@id=current/@id]">
>        <error/>
>      </xsl:when>
>      <xsl:otherwise>
>        <tree id="{@id}">
>          <xsl:apply-templates mode="remove-dupes"/>
>        </tree>
>      </xsl:otherwise>
>    </xsl:choose>
> </xsl:template>
> 
> There are two problems with this.  First, generating the intermediate 
> result could run forever if the input is seriously malformed (e.g. a 
> loop, child refers to parent); checking as it is generated 
> would avoid 
> this.  Second, the test "preceding::tree[@id=current/@id]" is O(n^2). 
> Can I use a key to avoid this?  How can a key be applied to an 
> exsl:node-set() result?
> 
> XSLT often needs some lateral thinking to come up with a good 
> solution. 
>   I hope that someone who's read this far has a trick to 
> solve this for me.
> 
> Thanks in advance!
> 
> --Phil.

Current Thread