RE: [xsl] Recursive grouping - simple, XSLT 1.0, fast non-Muenchian grouping method

Subject: RE: [xsl] Recursive grouping - simple, XSLT 1.0, fast non-Muenchian grouping method
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Mon, 20 Dec 2004 21:03:25 -0000
> The presented algorithm works much faster than Muenchian 
> grouping without xsl:key 

The use of xsl:key is essential to Muenchian grouping, so there seems to be
some misunderstanding here.

> The advantage of recursive grouping is that it is possible to define
> grouping rules that are much more complex than one accepted by the use
> attribute of xsl:key or by generate-id() function. For 
> example if in a list
> of items that have attributes price and quantity the grouping 
> should be done
> by price*quantity.

Muenchian grouping makes it easy to group on the result of any path
expression, e.g.

<xsl:key name="m" match="order-line" group="@price * @qty"/>

If there is a limitation, it is that Muenchian grouping is inconvenient when
the node-set to be grouped is anything other than "all nodes in a single
document that match pattern P": that is, it's inconvenient when the
population spans multiple documents, or when it is scoped to a particular
subtree within a document.  

> A recoursive grouping with xsl:key is presented below.

The downside here is:

>     <xsl:with-param name="list"
> select="$list[not(@author=$group-identifier)]"/>

which performs a serial search of the grouping population (actually, on
average, half the population) once for each distinct value of the grouping
key. The algorithm therefore has order (at least) O(P*G) where P is the size
of the population and G the number of groups. In applications where the
number of groups increases with the population (e.g. grouping employees by
surname) this is effectively O(N^2). 

I agree that the algorithm is viable in cases where the number of groups is
small and almost fixed, e.g. grouping sales by continent. 

But of course if the number of groups is completely fixed, the simplest
approach is:

for-each continent
  for-each P[continent=current()]

Michael Kay

Current Thread