Re: [xsl] Recursive grouping - simple, XSLT 1.0, fast non-Muenchian grouping method

Subject: Re: [xsl] Recursive grouping - simple, XSLT 1.0, fast non-Muenchian grouping method
From: "M. David Peterson" <m.david.x2x2x@xxxxxxxxx>
Date: Mon, 20 Dec 2004 14:27:35 -0700
If I could throw in a quick curve ball to this conversation.  It seems
that what Surgiu has worked out is that by using variables to match
more than just the value of the attribute or element (e.g. element
name, for example, when using such information to determine proper
mapping to another XML file, similar to that of aspect weaving.)

Surgiu, you seem like a bright guy who would enjoy wrapping his teeth
around  a concept such as bringing together any number of external
data sources and weaving them together using commonalities between
these sources, whether through Regular Expressions (WooHoo XSLT 2.0!
:) or mapping of simpler comarisons using the translate() and
contains() functions to search for a particular straight string of
characters.  AspectXML is a project I worked on with one of my very
best friends Russ Miles who lives over in the UK (for now... just got
engaged to his LA based girlfriend so we'll see where he ends up)...  
While I think the concept is really cool and the code is Ok (refering
to the XSLT that I wrote and not the Java that Russ wrote... his Java
is near to perfect, reference his new release from Oreilly >
http://www.aspectjcookbook.com to see what I mean :) and while it has
technically been recommended as a production ready product by AOSD.net
there's still a TON more to do with it such as all the schema work.
You can access it at http://www.AspectXML.org and if you have a chance
to do so I would be interested to hear your take on how we could
improve it to incorporates some of your thoughts and ideas on this
topic, as this is where it seems you really seem to be headed and it
seems you have some good ideas on how to make that type of data
grouping really, really, fast...

I'll look forward to hearing from you if you have a chance to take a
look at it... you can come find me over at my blog, xsltblog.com or
post back to the list.  Ive been heads down for over a week and havent
even so much as opened up any email other than my Gmail account during
that time... but with the holidays here I hope to slow down so I will
keep an eye out for you here as well.

Best regards to you!

<M:D/>


On Mon, 20 Dec 2004 21:03:25 -0000, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> > The presented algorithm works much faster than Muenchian
> > grouping without xsl:key
> 
> The use of xsl:key is essential to Muenchian grouping, so there seems to be
> some misunderstanding here.
> 
> >
> > The advantage of recursive grouping is that it is possible to define
> > grouping rules that are much more complex than one accepted by the use
> > attribute of xsl:key or by generate-id() function. For
> > example if in a list
> > of items that have attributes price and quantity the grouping
> > should be done
> > by price*quantity.
> 
> Muenchian grouping makes it easy to group on the result of any path
> expression, e.g.
> 
> <xsl:key name="m" match="order-line" group="@price * @qty"/>
> 
> If there is a limitation, it is that Muenchian grouping is inconvenient when
> the node-set to be grouped is anything other than "all nodes in a single
> document that match pattern P": that is, it's inconvenient when the
> population spans multiple documents, or when it is scoped to a particular
> subtree within a document.
> 
> >
> > A recoursive grouping with xsl:key is presented below.
> >
> 
> The downside here is:
> 
> >     <xsl:with-param name="list"
> > select="$list[not(@author=$group-identifier)]"/>
> 
> which performs a serial search of the grouping population (actually, on
> average, half the population) once for each distinct value of the grouping
> key. The algorithm therefore has order (at least) O(P*G) where P is the size
> of the population and G the number of groups. In applications where the
> number of groups increases with the population (e.g. grouping employees by
> surname) this is effectively O(N^2).
> 
> I agree that the algorithm is viable in cases where the number of groups is
> small and almost fixed, e.g. grouping sales by continent.
> 
> But of course if the number of groups is completely fixed, the simplest
> approach is:
> 
> for-each continent
>   for-each P[continent=current()]
> 
> 
> Michael Kay
> http://www.saxonica.com/
> 
> 


-- 
:: M. David Peterson ::
XML & XML Transformations, C#, .NET, and Functional Languages Specialist

Current Thread