RE: [xsl] Re: Checking for Cycles in a Graph

Subject: RE: [xsl] Re: Checking for Cycles in a Graph
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Thu, 1 May 2008 02:17:04 +0100
> I think I found a bug in some example code from XSLT 2.0 
> Programmer's Reference by Michael Kay and would like to 
> review it with Michael and/or other XSLT programmers. 

You are right, this code published in the 3rd edition is embarrassingly
wrong. There's a corrected version in the 4th edition, which I believe is
now shipping. Sorry about the inconvenience.

The corrected algorithm passes an extra parameter on each recursive call,
which is the list of nodes marking the route from the starting node to the
node currently being visited. If the nto currently being visited contains a
reference to any of the nodes in this list, there is a cycle in the data,
which can be reported.in

Most of the textbook algorithms for detecting cycles in graphs are written
in a procedural way, and the error arose because I tried to rethink the
problem in functional terms (and failed to get it right).

Michael Kay
http://www.saxonica.com/


> 
> p.199 Checking for Cycles in a Graph:
> 
> <xsl:function name="graph:refers" as="xs:boolean">
>    <xsl:param name="links" as="node()"/>
>    <!-- $links is a node that represents the template to be called -->
>    <xsl:param name="A" as="node()"/>
>    <xsl:param name="B" as="node()"/>
> 
>    <!-- find the directly-connected nodes -->
>    <xsl:variable name="direct" as="node()*">
>       <xsl:apply-templates select="$links">
>          <xsl:with-param name="from" select="$A"/>
>       </xsl:apply-templates>
>    </xsl:variable>
> 
>    <!-- return true if B is directly or indirectly connected 
> from A -->
>    <xsl:sequence select="if (exists($direct intersect $B)) then true()
>                          else if (some $C in $direct
>                                   satisfies graph:refers($links, $C,
> $B)) then true()
>                          else false()"/> </xsl:function>
> 
> I have defined a template for $links as required.  (But it is 
> not necessary to see the bug because I think the bug is in 
> the higher order function noted above.)
> 
> Typically cycles are checked for by evaluating:
> graph:refers($something, $something).  If there is connection 
> through $something's direct links and their direct links and 
> so on, then there is a cycle.
> 
> The problem is that "graph:refers" may enter an infinite loop 
> if $direct contains a cycle.  Consider the case where $A does 
> not refer to $B but one of $A's direct links contain a cycle. 
>  In the XPath for xsl:sequence, the function recursively 
> calls itself with "graph:refers($links, $C, $B)".  $C becomes 
> $A in the next iteration, and its direct links are found in 
> order to test if one of them refers back to $B.  But notice 
> that there is no check if $C contains a cycle!
>  So it is possible that $C's direct links could be navigated 
> forever....without reaching $B first.  (I am actually 
> experiencing this problem with a large data set and haven't 
> had time to make a simpler use-case.  But I am hoping that 
> people can follow the problem generically with the thought 
> exercise I just described.)
> 
> So it seems like $C the algorithm should check if $C contains 
> a cycle before testing "graph:refers($links,$C,$B)"... but 
> after some thinking, I am realizing that just because $C 
> contains a cycle, doesn't mean that $A doesn't refers to $B.  
> Perhaps the solution is to test if $C contains a cycle, and 
> if it does, remove the cycle(s) -> creating $D.  And then 
> test "graph:refers(l$inks,$D,$B)".
> 
> This seems complicated though... and I thought I should 
> discuss this issue with others before I continue to hack at 
> this algorithm.  It seems like this type of algorithm/problem 
> would be well-known for graphs?
> 
> Has anyone explored this problem?  Any suggestions?
> 
> Thanks
> Darcy

Current Thread