RE: [xsl] Different performance of nodesets created in different ways

Subject: RE: [xsl] Different performance of nodesets created in different ways
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Fri, 1 Feb 2008 10:58:13 -0000
Performance questions like this are very product dependent. The node-set()
function in one processor might involve a major reorganization of the
underlying data structure, in another it might be a no-op. I think you're
better off discussing this on a Xalan list. However, I think there are one
or two points that one can make that are product-independent.

> <xsl:template name="bigMemoryUsage">
>             <xsl:variable name="big">                         
                     // result tree fragment
>                         <xsl:copy-of select="/a/b"/>
>             </xsl:variable>
>             <xsl:for-each select="/a/b">
>                 <xsl:variable name="i" select="position()"/>
>                 <xsl:value-of
> select="xalan:nodeset($big)/b[position()=$i]"/>
>             </xsl:for-each>

If the implementation of nodeset() does involve major reorganisation, and if
the processor doesn't do very advanced optimisation, then you're going to
get a big saving by moving the call on nodeset() outside the loop.
>  
> When I changed the <xsl:variable> to get its value from the
> select="..." attribute, i.e. to be of type node-set, and
> removed the call to xalan:nodeset() - my test case used a quarter as much
memory.

Not really a surprise - one is copying the data, the other isn't. A factor
of 4 seems a bit high, though. Perhaps Xalan's nodeset() function is making
a second copy.
>  
> The two versions of the stylesheet process the same nodes (or
> a copy of the same nodes) and produce the same output.

I'm a bit surprised by that. I would have expected this:

            <xsl:variable name="small" select="/a/b"/>
            <xsl:for-each select="/a/b">
                <xsl:variable name="i" select="position()"/>
                <xsl:value-of select="$small/b[position()=$i]"/>
            </xsl:for-each>

to produce no output because $small is a set of b elements, which have no
children called b.

The performance difference is presumably some quirk of when Xalan does
direct addressing into a node-set and when it has to do a scan. The results
are unlikely to extrapolate to a different processor.

I'd be interested to see how the Saxon results compare. Saxon for example
when you do the xsl:copy-of will create a virtual copy so the memory
overhead may be much less. Of course with XSLT 2.0 you don't need a
node-set() call.

Michael Kay
http://www.saxonica.com/

Current Thread