[xsl] REPOST: Solution: Efficient grouping without using the Muenchian method or key() function

Subject: [xsl] REPOST: Solution: Efficient grouping without using the Muenchian method or key() function
From: "Robbert van Dalen" <juicer@xxxxxxxxx>
Date: Thu, 20 Mar 2003 21:52:18 +0100
Hi all,

This is a repost of the previous message but now in a much nicer format ;-)

Cheers,
RvD


<!--
 Provides two grouping templates

 - group-on-key
 - group-on-item

 Author: Robbert van Dalen (juicer@xxxxxxxxx)

 NOTE 1:
 This stylesheet relies on the :NODESET extension because the algorithm uses
multiple passes and tree-fragments to compute the result. Otherwise it's a pure
XSLT implementation.

 NOTE 2:
 Uses the XALAN :NODESET extension. Not tested are XSLT processors other then
XALAN.

 RATIONALE:
 Grouping with XSLT can be done efficiently by the MUENCHIAN GROUPING technique
using the key() function to identify nodes within the same group.
 The key() function however requires the nodeset to be a DOCUMENT nodeset.
 The main problem with MUENCHIAN GROUPING is then that it cannot group
tree-fragments.
 Use this functionality if you want to group tree-fragments:

 ALGORITHM OUTLINE:
 The algorithm has three stages.

 1) SORTING the nodes for a given key.
 2) Computing the RANGEs of nodes which have the same key.
 3) SELECTING the sorted nodes for each RANGE.


 So if we want to group to following nodeset:
 <c a="1"/> <c a="3"/> <c a="2"/> <c a="1"> <c a="3"/> <c a="2"/> <c a="1"/>

 Step 1:
 <c a="1"/><c a="1"/><c a="1"/><c a="2"/><c a="2"/><c a="3"/><c a="3"/>

 Step 2: (with step 2 as input)
 <r s="1" e="3"/> <r s="4" e="5"/> <r s="6" e="7"/>
 Note that the number of ranges equals the number of groups

 Step 3: (with step 1 and 2 as input)
 <g> <c a="1"/> <c a="1"/> <c a="1"/> </g>
 <g> <c a="2"/> <c a="2"/> </g>
 <g> <c a="3"/> <c a="3"/> </g>

 All this can be achieved by searching, selecting and filtering nodes with XPATH
expressions. XPATH however, does not provide efficient means to select a RANGE
of nodes within a nodeset.
 For instance, walking a list of N nodes can be implemented using recursion. But
without tail-recursion elimination the recursion depth will be to deep for most
XSLT implementations.

 Matters get worse when the number of groups is more or less equal to the number
of nodes. Naieve implementations can easily take O(N^2) time, which is not
acceptable.
 This implementation has a time complexity of O(log(N)*N) which is fine because
sorting usually also takes O(log(N)*N) (depending on the XSLT's implementation
of xsl:sort).
 To achieve this speed, step 3 is implemented by partitioning the nodeset into a
binary tree. This binary tree is then used to select nodes within a range.

 Binary partitioning is an example of a DiVide and Conquer (DVC) algorithm
neccesary to reduce recursion depth. After partitioning, the binary tree can be
reused to implement other DVC algorithms more easily (because complex recursive
algorithms can now be reimplemented by just walking the binary tree).
 -->


 <!--
 Template: group-on-key
 Use this template to group <nodes> which share a common attribute <key>
 The result will be sub-sets of <nodes> surrounded by <group/> tags
 -->

 <xsl:template name="group-on-key">
  <xsl:param name="nodes"/>
  <xsl:param name="key"/>

  <xsl:variable name="items">
   <xsl:for-each select="$nodes">
    <item>
     <key>
      <xsl:value-of select="./@*[name() = $key]"/>
     </key>
     <value>
      <xsl:copy-of select="."/>
     </value>
    </item>
   </xsl:for-each>
  </xsl:variable>

  <xsl:variable name="grouped-items">
   <xsl:call-template name="group-on-item">
    <xsl:with-param name="nodes" select="xalan:nodeset($items)/*"/>
    <xsl:with-param name="key" select="$key"/>
   </xsl:call-template>
  </xsl:variable>

  <xsl:for-each select="xalan:nodeset($grouped-items)/*">
   <group>
    <xsl:for-each select="./*">
     <xsl:copy-of select="./value/*[1]"/>
    </xsl:for-each>
   </group>
  </xsl:for-each>

 </xsl:template>

 <!--
 Template: group-on-item
 Use this template to group <nodes> which share a common structure.
 You can build this structure yourself if you want to group on something else
then keys.

 The structure is the <item> structure and has the following layout
 <item>
  <key>
   aKey (can be anything, preferrably a string)
   </key>
  <value>
  aValue (can be anything, probably a node(set))
  </value>
 </item>

 Items will we grouped on the string value of <key>
 The result will be sub-sets of items surrounded by <group/> tags
 -->

 <xsl:template name="group-on-item">
  <xsl:param name="nodes"/>

        <!-- Step 1 -->
  <xsl:variable name="sorted">
   <xsl:for-each select="$nodes">
    <xsl:sort select="./key[1]/"/>
    <xsl:copy-of select="."/>
   </xsl:for-each>
  </xsl:variable>

  <xsl:variable name="sorted-tree" select="xalan:nodeset($sorted)/*"/>

        <!-- Step 2.1 -->
  <xsl:variable name="pivots">
   <xsl:call-template name="pivots">
    <xsl:with-param name="nodes" select="$sorted-tree"/>
   </xsl:call-template>
  </xsl:variable>

        <!-- Step 2.2 -->
  <xsl:variable name="ranges">
   <xsl:call-template name="ranges">
    <xsl:with-param name="pivots" select="xalan:nodeset($pivots)/*"/>
   </xsl:call-template>
  </xsl:variable>

        <!-- Step 3.1 -->
  <xsl:variable name="partition">
   <xsl:call-template name="partition">
    <xsl:with-param name="node" select="$sorted-tree[1]"/>
   </xsl:call-template>
  </xsl:variable>

  <xsl:variable name="root-partition" select="xalan:nodeset($partition)/*[1]"/>

        <!-- Step 3.2 -->
  <xsl:for-each select="xalan:nodeset($ranges)/range">
   <xsl:variable name="s" select="./start"/>
   <xsl:variable name="e" select="./end"/>

   <group>
    <xsl:call-template name="range-in-partition">
     <xsl:with-param name="s" select="$s"/>
     <xsl:with-param name="e" select="$e"/>
     <xsl:with-param name="p" select="$root-partition"/>
    </xsl:call-template>
   </group>
  </xsl:for-each>

 </xsl:template>

 <xsl:template name="pivots">
  <xsl:param name="nodes"/>
  <xsl:param name="key"/>

  <xsl:for-each select="$nodes">
   <xsl:if test="not(string(./key[1]/) =
string(./following-sibling::*[1]/key[1]/))">
    <pivot>
     <xsl:value-of select="position()"/>
    </pivot>
   </xsl:if>
  </xsl:for-each>
 </xsl:template>

 <xsl:template name="ranges">
  <xsl:param name="pivots"/>

  <xsl:for-each select="$pivots">
   <xsl:variable name="p" select="./preceding-sibling::*[1]"/>
   <range>
    <start>
     <xsl:choose>
      <xsl:when test="$p">
       <xsl:value-of select="$p + 1"/>
      </xsl:when>
      <xsl:otherwise>
       <xsl:value-of select="1"/>
      </xsl:otherwise>
     </xsl:choose>
    </start>
    <end>
     <xsl:value-of select="string(.)"/>
    </end>
   </range>
  </xsl:for-each>
 </xsl:template>


 <!--
 Template: partition
 Partition a nodeset into a binary tree

 This DVC algorithm has a target time complexity of O(log(N)*N) and a 'copy'
complexity of O(N) (only O(N) nodes are copied).

 Reducing the number of nodes by a half for each step is the crux of the
algorithm. One way of splitting the nodes is by using the expressions:
'($nodes/*[position() &lt; $mid)' and '($nodes/*[position() &gt;= $mid)' with
$mid is 'count($nodes) div 2'. However this will introduce a 'copy' complexity
of O(log(N)*N).

 This implementation uses the 'following-sibling::*[I]' expression to get around
this problem. One exception is that this operation must take O(I) time instead
of O(N) time to get the I'th sibling in a list of N nodes.
 This, however depends on XSLT's implementation of 'following-sibling::*[I]'.
The upside is that if it takes O(1) time then the overall time complexity will
be O(N).
 -->

 <xsl:template name="partition">
  <xsl:param name="node"/>
  <xsl:param name="s" select="(count($node/preceding-sibling::*)) + 1"/>
  <xsl:param name="e" select="(count($node/following-sibling::*)) + $s"/>

  <xsl:if test="$node">
   <xsl:element name="r">
    <xsl:element name="s">
     <xsl:value-of select="$s"/>
    </xsl:element>
    <xsl:element name="e">
     <xsl:value-of select="$e"/>
    </xsl:element>
    <xsl:choose>
     <xsl:when test="$s = $e">
      <xsl:element name="leaf">
       <xsl:copy-of select="$node"/>
      </xsl:element>
     </xsl:when>
     <xsl:otherwise>
      <xsl:variable name="w" select="floor(($e - $s + 1) div 2)"/>
      <xsl:variable name="m" select="$s + $w"/>
      <xsl:call-template name="partition">
       <xsl:with-param name="node" select="$node"/>
       <xsl:with-param name="s" select="$s"/>
       <xsl:with-param name="e" select="$m - 1"/>
      </xsl:call-template>
      <xsl:call-template name="partition">
       <xsl:with-param name="node" select="$node/following-sibling::*[$w]"/>
       <xsl:with-param name="s" select="$m"/>
       <xsl:with-param name="e" select="$e"/>
      </xsl:call-template>
     </xsl:otherwise>
    </xsl:choose>
   </xsl:element>
  </xsl:if>
 </xsl:template>

 <!--
 Template: range-in-partition
 Selects a RANGE in a binary tree

 XSLT isn't really hmaking life easier but trying to do this in a DVC style
directly without the help of a binary tree is even harder.
 -->

 <xsl:template name="range-in-partition">
  <xsl:param name="p"/>
  <xsl:param name="s" select="$p/s/"/>
  <xsl:param name="e" select="$p/e/"/>

  <xsl:variable name="ps" select="number($p/s/)"/>
  <xsl:variable name="pe" select="number($p/e/)"/>

  <xsl:if test="$s &lt;= $pe and $e &gt;= $ps">
   <xsl:if test="$ps = $pe">
    <xsl:copy-of select="$p/leaf/*[1]"/>
   </xsl:if>
   <xsl:choose>
    <xsl:when test="$s &gt; $ps">
     <xsl:variable name="s1" select="$s"/>
     <xsl:choose>
      <xsl:when test="$e &lt; $pe">
       <xsl:variable name="e1" select="$e"/>
       <xsl:for-each select="$p/*">
        <xsl:call-template name="range-in-partition">

         <xsl:with-param name="s" select="$s1"/>

         <xsl:with-param name="e" select="$e1"/>

         <xsl:with-param name="p" select="."/>

        </xsl:call-template>
       </xsl:for-each>
      </xsl:when>
      <xsl:otherwise>
       <xsl:variable name="e1" select="$pe"/>
       <xsl:for-each select="$p/*">

        <xsl:call-template name="range-in-partition">

         <xsl:with-param name="s" select="$s1"/>

         <xsl:with-param name="e" select="$e1"/>

         <xsl:with-param name="p" select="."/>

        </xsl:call-template>
       </xsl:for-each>
      </xsl:otherwise>
     </xsl:choose>
    </xsl:when>
    <xsl:otherwise>
     <xsl:variable name="s1" select="$ps"/>
     <xsl:choose>
      <xsl:when test="$e &lt; $pe">
       <xsl:variable name="e1" select="$e"/>
       <xsl:for-each select="$p/*">

        <xsl:call-template name="range-in-partition">

         <xsl:with-param name="s" select="$s1"/>

         <xsl:with-param name="e" select="$e1"/>

         <xsl:with-param name="p" select="."/>

        </xsl:call-template>
       </xsl:for-each>
      </xsl:when>
      <xsl:otherwise>
       <xsl:variable name="e1" select="$pe"/>
       <xsl:for-each select="$p/*">

        <xsl:call-template name="range-in-partition">

         <xsl:with-param name="s" select="$s1"/>

         <xsl:with-param name="e" select="$e1"/>

         <xsl:with-param name="p" select="."/>

        </xsl:call-template>
       </xsl:for-each>
      </xsl:otherwise>
     </xsl:choose>
    </xsl:otherwise>
   </xsl:choose>
  </xsl:if>
 </xsl:template>

</xsl:stylesheet>





 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread