Subject: [xsl] REPOST: Solution: Efficient grouping without using the Muenchian method or key() function From: "Robbert van Dalen" <juicer@xxxxxxxxx> Date: Thu, 20 Mar 2003 21:52:18 +0100 |
Hi all, This is a repost of the previous message but now in a much nicer format ;-) Cheers, RvD <!-- Provides two grouping templates - group-on-key - group-on-item Author: Robbert van Dalen (juicer@xxxxxxxxx) NOTE 1: This stylesheet relies on the :NODESET extension because the algorithm uses multiple passes and tree-fragments to compute the result. Otherwise it's a pure XSLT implementation. NOTE 2: Uses the XALAN :NODESET extension. Not tested are XSLT processors other then XALAN. RATIONALE: Grouping with XSLT can be done efficiently by the MUENCHIAN GROUPING technique using the key() function to identify nodes within the same group. The key() function however requires the nodeset to be a DOCUMENT nodeset. The main problem with MUENCHIAN GROUPING is then that it cannot group tree-fragments. Use this functionality if you want to group tree-fragments: ALGORITHM OUTLINE: The algorithm has three stages. 1) SORTING the nodes for a given key. 2) Computing the RANGEs of nodes which have the same key. 3) SELECTING the sorted nodes for each RANGE. So if we want to group to following nodeset: <c a="1"/> <c a="3"/> <c a="2"/> <c a="1"> <c a="3"/> <c a="2"/> <c a="1"/> Step 1: <c a="1"/><c a="1"/><c a="1"/><c a="2"/><c a="2"/><c a="3"/><c a="3"/> Step 2: (with step 2 as input) <r s="1" e="3"/> <r s="4" e="5"/> <r s="6" e="7"/> Note that the number of ranges equals the number of groups Step 3: (with step 1 and 2 as input) <g> <c a="1"/> <c a="1"/> <c a="1"/> </g> <g> <c a="2"/> <c a="2"/> </g> <g> <c a="3"/> <c a="3"/> </g> All this can be achieved by searching, selecting and filtering nodes with XPATH expressions. XPATH however, does not provide efficient means to select a RANGE of nodes within a nodeset. For instance, walking a list of N nodes can be implemented using recursion. But without tail-recursion elimination the recursion depth will be to deep for most XSLT implementations. Matters get worse when the number of groups is more or less equal to the number of nodes. Naieve implementations can easily take O(N^2) time, which is not acceptable. This implementation has a time complexity of O(log(N)*N) which is fine because sorting usually also takes O(log(N)*N) (depending on the XSLT's implementation of xsl:sort). To achieve this speed, step 3 is implemented by partitioning the nodeset into a binary tree. This binary tree is then used to select nodes within a range. Binary partitioning is an example of a DiVide and Conquer (DVC) algorithm neccesary to reduce recursion depth. After partitioning, the binary tree can be reused to implement other DVC algorithms more easily (because complex recursive algorithms can now be reimplemented by just walking the binary tree). --> <!-- Template: group-on-key Use this template to group <nodes> which share a common attribute <key> The result will be sub-sets of <nodes> surrounded by <group/> tags --> <xsl:template name="group-on-key"> <xsl:param name="nodes"/> <xsl:param name="key"/> <xsl:variable name="items"> <xsl:for-each select="$nodes"> <item> <key> <xsl:value-of select="./@*[name() = $key]"/> </key> <value> <xsl:copy-of select="."/> </value> </item> </xsl:for-each> </xsl:variable> <xsl:variable name="grouped-items"> <xsl:call-template name="group-on-item"> <xsl:with-param name="nodes" select="xalan:nodeset($items)/*"/> <xsl:with-param name="key" select="$key"/> </xsl:call-template> </xsl:variable> <xsl:for-each select="xalan:nodeset($grouped-items)/*"> <group> <xsl:for-each select="./*"> <xsl:copy-of select="./value/*[1]"/> </xsl:for-each> </group> </xsl:for-each> </xsl:template> <!-- Template: group-on-item Use this template to group <nodes> which share a common structure. You can build this structure yourself if you want to group on something else then keys. The structure is the <item> structure and has the following layout <item> <key> aKey (can be anything, preferrably a string) </key> <value> aValue (can be anything, probably a node(set)) </value> </item> Items will we grouped on the string value of <key> The result will be sub-sets of items surrounded by <group/> tags --> <xsl:template name="group-on-item"> <xsl:param name="nodes"/> <!-- Step 1 --> <xsl:variable name="sorted"> <xsl:for-each select="$nodes"> <xsl:sort select="./key[1]/"/> <xsl:copy-of select="."/> </xsl:for-each> </xsl:variable> <xsl:variable name="sorted-tree" select="xalan:nodeset($sorted)/*"/> <!-- Step 2.1 --> <xsl:variable name="pivots"> <xsl:call-template name="pivots"> <xsl:with-param name="nodes" select="$sorted-tree"/> </xsl:call-template> </xsl:variable> <!-- Step 2.2 --> <xsl:variable name="ranges"> <xsl:call-template name="ranges"> <xsl:with-param name="pivots" select="xalan:nodeset($pivots)/*"/> </xsl:call-template> </xsl:variable> <!-- Step 3.1 --> <xsl:variable name="partition"> <xsl:call-template name="partition"> <xsl:with-param name="node" select="$sorted-tree[1]"/> </xsl:call-template> </xsl:variable> <xsl:variable name="root-partition" select="xalan:nodeset($partition)/*[1]"/> <!-- Step 3.2 --> <xsl:for-each select="xalan:nodeset($ranges)/range"> <xsl:variable name="s" select="./start"/> <xsl:variable name="e" select="./end"/> <group> <xsl:call-template name="range-in-partition"> <xsl:with-param name="s" select="$s"/> <xsl:with-param name="e" select="$e"/> <xsl:with-param name="p" select="$root-partition"/> </xsl:call-template> </group> </xsl:for-each> </xsl:template> <xsl:template name="pivots"> <xsl:param name="nodes"/> <xsl:param name="key"/> <xsl:for-each select="$nodes"> <xsl:if test="not(string(./key[1]/) = string(./following-sibling::*[1]/key[1]/))"> <pivot> <xsl:value-of select="position()"/> </pivot> </xsl:if> </xsl:for-each> </xsl:template> <xsl:template name="ranges"> <xsl:param name="pivots"/> <xsl:for-each select="$pivots"> <xsl:variable name="p" select="./preceding-sibling::*[1]"/> <range> <start> <xsl:choose> <xsl:when test="$p"> <xsl:value-of select="$p + 1"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="1"/> </xsl:otherwise> </xsl:choose> </start> <end> <xsl:value-of select="string(.)"/> </end> </range> </xsl:for-each> </xsl:template> <!-- Template: partition Partition a nodeset into a binary tree This DVC algorithm has a target time complexity of O(log(N)*N) and a 'copy' complexity of O(N) (only O(N) nodes are copied). Reducing the number of nodes by a half for each step is the crux of the algorithm. One way of splitting the nodes is by using the expressions: '($nodes/*[position() < $mid)' and '($nodes/*[position() >= $mid)' with $mid is 'count($nodes) div 2'. However this will introduce a 'copy' complexity of O(log(N)*N). This implementation uses the 'following-sibling::*[I]' expression to get around this problem. One exception is that this operation must take O(I) time instead of O(N) time to get the I'th sibling in a list of N nodes. This, however depends on XSLT's implementation of 'following-sibling::*[I]'. The upside is that if it takes O(1) time then the overall time complexity will be O(N). --> <xsl:template name="partition"> <xsl:param name="node"/> <xsl:param name="s" select="(count($node/preceding-sibling::*)) + 1"/> <xsl:param name="e" select="(count($node/following-sibling::*)) + $s"/> <xsl:if test="$node"> <xsl:element name="r"> <xsl:element name="s"> <xsl:value-of select="$s"/> </xsl:element> <xsl:element name="e"> <xsl:value-of select="$e"/> </xsl:element> <xsl:choose> <xsl:when test="$s = $e"> <xsl:element name="leaf"> <xsl:copy-of select="$node"/> </xsl:element> </xsl:when> <xsl:otherwise> <xsl:variable name="w" select="floor(($e - $s + 1) div 2)"/> <xsl:variable name="m" select="$s + $w"/> <xsl:call-template name="partition"> <xsl:with-param name="node" select="$node"/> <xsl:with-param name="s" select="$s"/> <xsl:with-param name="e" select="$m - 1"/> </xsl:call-template> <xsl:call-template name="partition"> <xsl:with-param name="node" select="$node/following-sibling::*[$w]"/> <xsl:with-param name="s" select="$m"/> <xsl:with-param name="e" select="$e"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:element> </xsl:if> </xsl:template> <!-- Template: range-in-partition Selects a RANGE in a binary tree XSLT isn't really hmaking life easier but trying to do this in a DVC style directly without the help of a binary tree is even harder. --> <xsl:template name="range-in-partition"> <xsl:param name="p"/> <xsl:param name="s" select="$p/s/"/> <xsl:param name="e" select="$p/e/"/> <xsl:variable name="ps" select="number($p/s/)"/> <xsl:variable name="pe" select="number($p/e/)"/> <xsl:if test="$s <= $pe and $e >= $ps"> <xsl:if test="$ps = $pe"> <xsl:copy-of select="$p/leaf/*[1]"/> </xsl:if> <xsl:choose> <xsl:when test="$s > $ps"> <xsl:variable name="s1" select="$s"/> <xsl:choose> <xsl:when test="$e < $pe"> <xsl:variable name="e1" select="$e"/> <xsl:for-each select="$p/*"> <xsl:call-template name="range-in-partition"> <xsl:with-param name="s" select="$s1"/> <xsl:with-param name="e" select="$e1"/> <xsl:with-param name="p" select="."/> </xsl:call-template> </xsl:for-each> </xsl:when> <xsl:otherwise> <xsl:variable name="e1" select="$pe"/> <xsl:for-each select="$p/*"> <xsl:call-template name="range-in-partition"> <xsl:with-param name="s" select="$s1"/> <xsl:with-param name="e" select="$e1"/> <xsl:with-param name="p" select="."/> </xsl:call-template> </xsl:for-each> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:otherwise> <xsl:variable name="s1" select="$ps"/> <xsl:choose> <xsl:when test="$e < $pe"> <xsl:variable name="e1" select="$e"/> <xsl:for-each select="$p/*"> <xsl:call-template name="range-in-partition"> <xsl:with-param name="s" select="$s1"/> <xsl:with-param name="e" select="$e1"/> <xsl:with-param name="p" select="."/> </xsl:call-template> </xsl:for-each> </xsl:when> <xsl:otherwise> <xsl:variable name="e1" select="$pe"/> <xsl:for-each select="$p/*"> <xsl:call-template name="range-in-partition"> <xsl:with-param name="s" select="$s1"/> <xsl:with-param name="e" select="$e1"/> <xsl:with-param name="p" select="."/> </xsl:call-template> </xsl:for-each> </xsl:otherwise> </xsl:choose> </xsl:otherwise> </xsl:choose> </xsl:if> </xsl:template> </xsl:stylesheet> XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Loop through attributes, J.Pietschmann | Thread | [xsl] How to Process when Row data , Imrran Wahid |
Re: [xsl] where does one stick the , W. Eliot Kimber | Date | [xsl] How to Process when Row data , Imrran Wahid |
Month |