[xsl] Some results (was: Re: [xsl] Recursive grouping - simple, XSLT 1.0, fast non-Muenchian grouping method)

Subject: [xsl] Some results (was: Re: [xsl] Recursive grouping - simple, XSLT 1.0, fast non-Muenchian grouping method)
From: Dimtre Novatchev <dnovatchev@xxxxxxxxx>
Date: Tue, 21 Dec 2004 20:09:22 +1100
Hi Sergio,

I performed some tests and here are the results on a source xml
document containing 20516 elements that had to be grouped in two
levels. The document has the same structure as the one in your
examples, but more "instances".

I had to upgrade your code to make it general, so that it can now
handle any number of levels of grouping. The names of the attributes
on which to group consecutively are given as a parameter.

Here's your modified code (50 lines):

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
    
  <xsl:output omit-xml-declaration="yes" indent="yes"/>
  
  <xsl:template match="/booklist">
    <authors>
      <xsl:call-template name="RecursiveGrouping">
        <xsl:with-param name="list" select="book"/>
        <xsl:with-param name="attrList" select="/*/book[1]/@*"/>
      </xsl:call-template>
    </authors>
  </xsl:template>

  <xsl:template name="RecursiveGrouping">
    <xsl:param name="list"/>
    <xsl:param name="attrList"/>
    
    
    <xsl:if test="$attrList">
       <!-- Selecting the first attribute name as group identifier and the group
      itself-->
      <xsl:variable name="attrName" select="name($attrList[1])"/>
    
      <xsl:variable name="group-identifier" select="$list[1]/@*[name()
= $attrName]"/>
      <xsl:variable name="group"
select="$list[@*[name()=$attrName]=$group-identifier]"/>
      
       <!-- Do some work for the group -->
       <xsl:element name="{$attrName}">
         <xsl:attribute name="value">
           <xsl:value-of select="$group-identifier"/>
         </xsl:attribute>

          <xsl:call-template name="RecursiveGrouping">
            <xsl:with-param name="list" select="$group"/>
            <xsl:with-param name="attrList" select="$attrList[position() > 1]"/>
          </xsl:call-template>
       </xsl:element>
  
       <!-- If there are other groups left, calls itself -->
      <xsl:if test="count($list)>count($group)">
        <xsl:call-template name="RecursiveGrouping">
          <xsl:with-param name="list"
              select="$list[not(@*[name()=$attrName]=$group-identifier)]"/>
          <xsl:with-param name="attrList" select="$attrList"/>
        </xsl:call-template>
      </xsl:if>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>


Here's the code using the Muenchian method:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
 
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 
  <xsl:key name="kAuth" match="book" use="@author"/>
  <xsl:key name="kAuthTitle" match="book" 
   use="concat(@author, '+', @title)"/>
  
  <xsl:template match="/">
    <xsl:for-each select=
       "/*/book[generate-id()
               =
                generate-id(key('kAuth', @author)[1])
               ]">
      <xsl:element name="author">
        <xsl:attribute name="value">
          <xsl:value-of select="@author"/>
        </xsl:attribute>
        <xsl:for-each select=
          "key('kAuth', @author)
              [generate-id()
              =
               generate-id(key('kAuthTitle', 
                               concat(@author, '+', @title)
                               )[1]
                           )
              ]">
          <xsl:element name="title">
            <xsl:attribute name="value">
              <xsl:value-of select="@title"/>
            </xsl:attribute>
          </xsl:element>
        </xsl:for-each>
        </xsl:element>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

The results in milliseconds, using Saxon 6.5.3 on a 3GHz 2GB RAM PC
are as follows:

  Muenchian code:    1265

  Sergio recursive:  38188



I will be glad to provide the source xml document used in the test to
any interested person (and why not put them somewhere on the web?).

Cheers,

Dimitre


On Mon, 20 Dec 2004 19:47:07 +0200, Sergiu Ignat <sergiu.ignat@xxxxxxxx> wrote:
> Hello Dimitre,
> 
> The presented algorithm works much faster than Muenchian grouping without
> xsl:key for a large xml of about 720 elements that sould be grouped in
> groups of 1 to 5 elements per group, up to 5 times faster in that particular
> case.(If you want I can send you the input samples, they are too large to be
> sent to the group). It works slower than Muenchian method with xsl:key and
> generate-id(). The same recursive grouping can be implemented using xsl:key.
> It is much faster than the previous version but it is still a bit slower
> than Muenchian method with xsl:key and generate-id().
> 
> The advantage of recursive grouping is that it is possible to define
> grouping rules that are much more complex than one accepted by the use
> attribute of xsl:key or by generate-id() function. For example if in a list
> of items that have attributes price and quantity the grouping should be done
> by price*quantity.
> 
> Using any of this 2 methods it is also possible to generate a sorted output.
> In that case instead of selecting the grouping identifier of the first
> element in the list it is necessary to select the min or max grouping
> identifier value.
> 
> We are using XSL for reporting, and sometimes grouping rules are very
> complex.
> 
> A recoursive grouping with xsl:key is presented below.
> 
> <?xml version="1.0" encoding="UTF-8" ?>
> <xsl:stylesheet version="1.0"
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
> <xsl:key name="title-search" match="book" use="@author" />
> <xsl:template match="/booklist">
> 
>  <authors>
>   <xsl:call-template name="RecursiveGrouping">
>    <xsl:with-param name="list" select="book"/>
>   </xsl:call-template>
>  </authors>
> </xsl:template>
> 
> <xsl:template name="RecursiveGrouping">
>  <xsl:param name="list"/>
> 
>  <!-- Selecting the first author name as group identifier and the group
> itself-->
>  <xsl:variable name="group-identifier" select="$list[1]/@author"/>
>  <xsl:variable name="group"
> select="key('title-search',$group-identifier)"/>
> 
>  <!-- Do some work for the group -->
>  <author name="{$group-identifier}" number-of-books="{count($group)}"/>
> 
>  <!-- If there are other groups left, calls itself -->
>  <xsl:if test="count($list)>count($group)">
>  <xsl:call-template name="RecursiveGrouping">
>    <xsl:with-param name="list"
> select="$list[not(@author=$group-identifier)]"/>
>  </xsl:call-template>
>  </xsl:if>
> </xsl:template>
> </xsl:stylesheet>
> 
> the xml input is:
> 
> <booklist>
>   <book author="Frank Herbert" title="Dune"/>
>   <book author="Roberto Quaglia" title="Bread, Butter and Paradoxine"/>
>   <book author="Kate Wilhelm" title="Where Late the Sweet Bird Sang"/>
>   <book author="Anthony Burgess" title="A Clockwork Orange"/>
>   <book author="Frank Herbert" title="Dragon in the Sea"/>
>   <book author="Anthony Burgess" title="Earthly Powers"/>
>   <book author="Isaak Asimov" title="The Foundation Trilogy"/>
>   <book author="Frank Herbert" title="Children of Dune"/>
>   <book author="Isaak Asimov" title="The Caves of Steel"/>
> </booklist>
> 
> ----- Original Message -----
> From: "Dimtre Novatchev" <dnovatchev@xxxxxxxxx>
> To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
> Cc: "Sergiu Ignat (dyve)" <sergiu.ignat@xxxxxxxx>
> Sent: Saturday, December 18, 2004 12:49 AM
> Subject: Re: [xsl] Recursive grouping - simple, XSLT 1.0, fast non-Muenchian
> grouping method
> 
> On Thu, 16 Dec 2004 19:27:46 +0200, Sergiu Ignat (dyve)
> <sergiu.ignat@xxxxxxxx> wrote:
> > Hello everybody.
> >
> > I would like to present you a simple, XSLT 1.0, fast grouping method with
> a
> > O(N*log(N)) complexity, the same as sorting. The only grouping method I
> knew
> > so far is Muenchian that has O(N^2) complexity.
> 
> The complexity of the Muenchian method is much better than O(N^2) --
> can you provide an example with representative data, where the
> Muenchian method behaves O(N^2)?

Current Thread