RE: [xsl] Comparing grouping techniques in terms of performance

Subject: RE: [xsl] Comparing grouping techniques in terms of performance
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Tue, 6 Apr 2004 18:27:03 +0100
You seem to have varied several things between the two stylesheets. One of
them uses for-each, another uses apply-templates; one uses the generate-id()
approach to compare node identity, the other uses the count($X|.) technique;
one adds more output; one does sorting. The golden rule with performance
comparisons is to only change one variable at a time. And then you need to
repeat the measurements with a different XSLT processor to see whether the
results are similar.

Michael Kay 

> -----Original Message-----
> From: Pieter Reint Siegers Kort [mailto:pieter.siegers@xxxxxxxxxxx] 
> Sent: 06 April 2004 16:43
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] Comparing grouping techniques in terms of performance
> 
> Hi all,
>  
> looking at various requests in the list regarding grouping, 
> especially the
> Muenchian Method, explained very well by Jeni at
>  http://www.jenitennison.com/xslt/grouping/muenchian.html, and another
> method I regularly have seen before, that uses 
> template processing rather than the <for-each> approach (see 
> below),  I
> wanted to see how the two methods compare in 
> terms of performance.
>  
> So, suppose I have the same input that Jeni uses, but making 
> it a bigger XML
> file (about 2000 entries):
>  
> <records>
>  <contact id="0001">
>   <title>Mr</title>
>   <forename>John</forename>
>   <surname>Smith</surname>
>  </contact>
>  <contact id="0002">
>   <title>Dr</title>
>   <forename>Amy</forename>
>   <surname>Jones</surname>
>  </contact>
>  <contact id="0002">
>   <title>Mr</title>
>   <forename>Brian</forename>
>   <surname>Jones</surname>
>  </contact>
>  <contact id="0002">
>   <title>Ms</title>
>   <forename>Fiona</forename>
>   <surname>Smith</surname>
>  </contact>
> ... repeating the above block ...
> </records>
> 
>  
> Using the <for-each> approach on my machine [Dell GX-240, Win2003,
> XSelerator 2.6, MSXML 4.0], like this:
>  
> <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform
> <http://www.w3.org/1999/XSL/Transform> " version="1.0">
>  
> <xsl:key name="contacts-by-surname" match="contact" use="surname" />
>  
> <xsl:key name="contacts-by-surname" match="contact" use="surname" />
> <xsl:template match="records">
>  <xsl:for-each select="contact[count(. | key('contacts-by-surname',
> surname)[1]) = 1]">
>   <xsl:sort select="surname" />
>   <xsl:value-of select="surname" />,<br />
>   <xsl:for-each select="key('contacts-by-surname', surname)">
>    <xsl:sort select="forename" />
>    <xsl:value-of select="forename" /> (<xsl:value-of 
> select="title" />)<br
> />
>   </xsl:for-each>
>  </xsl:for-each>
> </xsl:template>
>  
> </xsl:transform>
>  
> showed that the transformation took up about 750 msec.
>  
> Then, using the template approach (adding just a bit of 
> HTML), as follows:
>  
> <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform
> <http://www.w3.org/1999/XSL/Transform> " version="1.0">
>  
> <xsl:key name="contacts-by-surname" match="contact" use="surname" />
>  
> <xsl:template match="records">
>    <html>
>      <body>
>          <xsl:apply-templates select="contact[generate-id() =
> generate-id(key('contacts-by-surname', surname))]" mode="groups"/>
>      </body>
>    </html>
> </xsl:template>
>  
> <xsl:template match="contact" mode="groups">
>    <ul>
>    <xsl:value-of select="surname"/>,<br/><xsl:apply-templates
> select="key('contacts-by-surname', surname)"/>
>    </ul>
> </xsl:template>
>  
> <xsl:template match="contact">
>  &#160;&#160;&#160;&#160;&#160;&#160;<xsl:value-of
> select="forename"/>&#160;(<xsl:value-of select="title"/>)<br/>
> </xsl:template>
>  
> </xsl:transform>
>  
> which does practically the same, it took only about 50 msec, 
> which means a
> performance gain of 750/50 = 15 times better!!
>  
> I haven't been able yet to test using the .NET XslTransform 
> class, but that
> will come in a later stage...
>  
> So for big input files and using MSXML 4.0, I would rather 
> use the second
> approach.... wouldn't you all agree? 
>  
> And if so, shouldn't the second method not be the first (and 
> preferred)
> method mentioned by Jeni (after all, everyone points to that 
> page at first
> instance)?
>  
> <prs/>
> http://www.pietsieg.com <http://www.pietsieg.com/> 
> http://www.pietsieg.com/dotnetnuke
> Contributor on www.ASPToday.com <http://www.asptoday.com/> 
> Co-author on "Professional ASP.NET XML with C#", July 2002 by 
> Wrox Press

Current Thread