Re: topological sort

Subject: Re: topological sort
From: "Peter B. West" <pbwest@xxxxxxxxxxxxxx>
Date: Wed, 08 Nov 2000 15:46:48 +0000
I have been waiting with bated breath for a response to this post, but,
as I have not yet seen one (I'm subscribed to the digest), I will have a
go at it myself.  My XSLT is not all that flash, so please bear with me.

I struggled to follow this stylesheet.  The twists and turns of the
select condition were too much for this aging and procedurally-inclined
mind.  I had to insert the output of various sub-sets of the select
before I could get a handle on it.  When I started to think of
"count(...)=0" as a kind of a NOT, things became a bit clearer.  See
below.

> From: Joerg Pietschmann <joerg.pietschmann@xxxxxx>
> Subject: topological sort
> 
> Hello,
> i want to implement a topological sort in XSLT. This is necessary
> for generating program code (IDL for example) out of an XML file
> which contains a specification for program structures, like the
> following:
> 
> <!DOCTYPE structs [
> <!ELEMENT structs (struct*)>
> <!ELEMENT struct (name,field*)>
> <!ELEMENT field (name,type)>
> <!ELEMENT name (#PCDATA)>
> <!ELEMENT type (ref|long)>
> <!ELEMENT ref (#PCDATA)>
> <!ELEMENT long EMPTY>
> ]>
> 
> <structs>
>   <struct>
>     <name>A</name>
>     <field>
>       <name>A1</name>
>       <type><long/></type>
>     </field>
>   </struct>
>   <struct>
>     <name>B</name>
>     <field>
>       <name>B1</name>
>       <type><ref>A</ref></type>
>     </field>
>     <field>
>       <name>B2</name>
>       <type><ref>C</ref></type>
>     </field>
>   </struct>
>   <struct>
>     <name>C</name>
>     <field>
>       <name>C1</name>
>       <type><long/></type>
>     </field>
>   </struct>
>   <struct>
>     <name>D</name>
>     <field>
>       <name>D1</name>
>       <type><ref>E</ref></type>
>     </field>
>   </struct>
>   <struct>
>     <name>E</name>
>     <field>
>       <name>E1</name>
>       <type><ref>C</ref></type>
>     </field>
>   </struct>
> </structs>
> 
> The data types for structure fields may refer to other structures.
> Programming languages usually require the structures alredy defined
> before they can be referenced. I don't want to put the burden on the
> user to enter the specifications in proper order.
> Preferably the algorithm should not be confused by reference loops.
> 
> The usual algorithm is to walk the dependencies while keeping a list
> with already processed structure definitions. This, however requires
> assignable variables for bookkeeping.
> Another solution is to output the structures at levels of increasing
> distance from the leaves in the dependency graph. This can be
> implemented
> in XSLT but it is somewhat unelegant.
> 
> Some pseudo code for orientation:
> 
> processed_list=nil
> template process
>   generate structures which are not in the processed_list
>                       and only refer to structure types not in the
> processed_list
>   add the structure to the processed list
>   call process

This threw me.  Maybe I still don't understand, but shouldn't that be:

   generate structures which are
		NOT processed
		AND
			have no references
			OR
			only refer to structure types IN processed list
?

> The recursion may be terminated either if the processed list contains
> all
> structures (equivalently if it is exactly as long as the list with all
> structures) or if no more structures has been processed in a step.
> The first condition alone is not sufficient in case of loops which are
> stopped by the second condition, however, the second condition alone
> imposes an extra step. The following stylesheet uses only the second
> condition. The concatenations with the double colons should guard
> against
> spurious substring matches.
> 
> <xsl:stylesheet version="1.0"
>                 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
>   <xsl:output method="text"/>
> 
>   <xsl:template match="structs">
>     <xsl:call-template name="process">
>       <xsl:with-param name="processed" select="':'"/>
>     </xsl:call-template>
>   </xsl:template>
> 
>   <xsl:template name="process">
>     <xsl:param name="processed"/>
>     <xsl:for-each
> select="/structs/struct[not(contains($processed,concat(':',name,':')))
>  and count(field[count(type/ref)>0 and
> not(contains($processed,concat(':',./type/ref,':')))])=0]">
>       <xsl:value-of select="./name"/>
>     </xsl:for-each>
>     <xsl:variable name="actualprocessed">
>       <xsl:for-each
> select="/structs/struct[not(contains($processed,concat(':',name,':')))
>  and count(field[count(type/ref)>0 and
> not(contains($processed,concat(':',./type/ref,':')))])=0]">
>         <xsl:value-of select="concat(./name,':')"/>
>       </xsl:for-each>
>     </xsl:variable>
>     <xsl:if test="string-length($actualprocessed)>0">
>       <xsl:call-template name="process">
>         <xsl:with-param name="processed"
> select="concat($processed,$actualprocessed)"/>
>       </xsl:call-template>
>     </xsl:if>
>   </xsl:template>
> 
> </xsl:stylesheet>
> 
> This outputs "ACBED" which is the correct dependency order.
> 
> Is there a better solution, or can the stylesheet be improved?
> What bothers me most is that the struct elements must be processed
> twice,
> one for output and again for updating the processed_list.
> 
> Thanks for help
> 
> J.Pietschmann

I finally decided that the select was:
NOT processed
AND (
      NOT (
		field/type/ref exists
		AND
			NOT field/type/ref reference processed
          )
     )

which, when somebody's (De Morgan's?) rule
( !(A and B) ) = ( !A or !B )
is applied to  (!(A and (!B))) gives
(!A or !(!B)), i.e. (!A or B), which looks easier to me.  So the select
becomes
NOT processed
AND (
	field/type/ref does NOT exist
	OR
	field/type/ref reference processed
    )

The question is, what happens when all of the elements have type/ref, or
when none of the elements which have type/ref refer to any elements
which do not?  How do any of the elements with type/ref get themselves
processed?  I may have missed something here.

The other thing concerns the duplication of the select.  Why not just
use the $actualprocessed variable for both requirements?  Anyway, the
following stylesheet seems to work in the same way as the orginal, on
the same data.

<?xml version="1.0"?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
		version="1.0">

  <xsl:output method="text"/>

  <xsl:template match="structs">
    <xsl:call-template name="process">
      <xsl:with-param name="processed" select="':'"/>
    </xsl:call-template>
  </xsl:template>

  <xsl:template name="process">
    <xsl:param name="processed"/>
    <xsl:variable name="actualprocessed">
      <xsl:for-each
	select=
	  "/structs/struct[
	    not(contains($processed,concat(':',name,':')))
	    and
	    count(
              field[count(type/ref)=0
		    or
		    contains(
		      $processed,concat(':',./type/ref,':'))
		   ])>0
           ]">
        <xsl:value-of select="concat(./name,':')"/>
      </xsl:for-each>
    </xsl:variable>
    <xsl:if test="string-length($actualprocessed)>0">
      <xsl:value-of
	select="translate($actualprocessed,':','')"/>
      <xsl:call-template name="process">
        <xsl:with-param name="processed"
	select="concat($processed,$actualprocessed)"/>
      </xsl:call-template>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

I suspect that something useful could be done with keys here, but I have
no idea what.

In Domino.
Peter
-- 
Peter B. West  pbwest@xxxxxxxxxxxxxx  http://powerup.com.au/~pbwest
"Lord, to whom shall we go?"


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread