[xsl] Different performance of nodesets created in different ways

Subject: [xsl] Different performance of nodesets created in different ways
From: "TAYLOR Peter J \(AXA-I\)" <Peter.J.Taylor@xxxxxxxxxxxxxxxxxxx>
Date: Fri, 1 Feb 2008 10:29:38 -0000
We recently experienced an out of memory error with an xslt 1.0 stylesheet
which used the xalan nodeset() function to convert an <xsl:variable> with a
non-empty body from a result tree fragment into a nodeset.
 
My test case data looks like this -
 
<a>
       <b>g-day<c>hello</c><c>hello</c><c>hello</c></b>
       <b>g-day<c>hello</c><c>hello</c><c>hello</c></b>
       ... several thousand more <b>...</b> elements like this ...
       <b>g-day<c>hello</c><c>hello</c><c>hello</c></b>
       <b>g-day<c>hello</c><c>hello</c><c>hello</c></b>
</a>
 
My (deeply flawed) test-case stylesheet originally looked like this -
 
<?xml version='1.0'?>
<xsl:stylesheet version="1.0" xmlns:xalan="http://xml.apache.org/xalan";
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
 
<xsl:output method="text"/>
 
<xsl:template match="/">
            <xsl:call-template name="bigMemoryUsage"/>
</xsl:template>
 
<xsl:template name="bigMemoryUsage">
            <xsl:variable name="big">                     
                        // result tree fragment
                        <xsl:copy-of select="/a/b"/>
            </xsl:variable>
            <xsl:for-each select="/a/b">
                <xsl:variable name="i" select="position()"/>
                <xsl:value-of select="xalan:nodeset($big)/b[position()=$i]"/>
            </xsl:for-each>
</xsl:template>
 
</xsl:stylesheet>
 
When I changed the <xsl:variable> to get its value from the select="..."
attribute, i.e. to be of type node-set, and removed the call to
xalan:nodeset() -
 
<?xml version='1.0'?>
<xsl:stylesheet version="1.0" xmlns:xalan="http://xml.apache.org/xalan";
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
 
<xsl:output method="text"/>
 
<xsl:template match="/">
            <xsl:call-template name="smallMemoryUsage"/>
</xsl:template>
 
<xsl:template name="smallMemoryUsage">
            <xsl:variable name="small" select="/a/b"/>                      //
nodeset
            <xsl:for-each select="/a/b">
                <xsl:variable name="i" select="position()"/>
                <xsl:value-of select="$small/b[position()=$i]"/>
            </xsl:for-each>
</xsl:template>
 
</xsl:stylesheet>
 
my test case used a quarter as much memory.
 
The two versions of the stylesheet process the same nodes (or a copy of the
same nodes) and produce the same output. Unfortunately the "small memory"
version of the stylesheet ran for four times as long as the "big memory"
version.
 
When I experimentally changed the axis in the <xsl:value-of> from child to
descendant -
 
<?xml version='1.0'?>
<xsl:stylesheet version="1.0" xmlns:xalan="http://xml.apache.org/xalan";
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
 
<xsl:output method="text"/>
 
<xsl:template match="/">
            <xsl:call-template name="smallMemoryUsage"/>
</xsl:template>
 
<xsl:template name="smallMemoryUsage">
            <xsl:variable name="small" select="/a/b"/>
            <xsl:for-each select="/a/b">
                <xsl:variable name="i" select="position()"/>
                <xsl:value-of select="$small//b[position()=$i]"/>           
// descendant axis
            </xsl:for-each>
</xsl:template>
 
</xsl:stylesheet>
 
the "small memory" stylesheet took 5 times as long again to run. However, when
I made the corresponding change to the "big memory" stylesheet -
 
<?xml version='1.0'?>
<xsl:stylesheet version="1.0" xmlns:xalan="http://xml.apache.org/xalan";
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
 
<xsl:output method="text"/>
 
<xsl:template match="/">
            <xsl:call-template name="bigMemoryUsage"/>
</xsl:template>
 
<xsl:template name="bigMemoryUsage">
            <xsl:variable name="big">
                        <xsl:copy-of select="/a/b"/>
            </xsl:variable>
            <xsl:for-each select="/a/b">
                <xsl:variable name="i" select="position()"/>
                <xsl:value-of select="xalan:nodeset($big)//b[position()=$i]"/>
    // descendant axis
 
            </xsl:for-each>
</xsl:template>
 
</xsl:stylesheet>
 
the "big memory" stylesheet ran in about the same time as before.
 
I then rewrote the "small memory" stylesheet like this -
 
<?xml version='1.0'?>
<xsl:stylesheet version="1.0" xmlns:xalan="http://xml.apache.org/xalan";
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
 
<xsl:output method="text"/>
 
<xsl:template match="/">
            <xsl:call-template name="smallMemoryUsage"/>
</xsl:template>
 
<xsl:template name="smallMemoryUsage">
            <xsl:variable name="small" select="/a/b"/>
            <xsl:for-each select="$small/b">
                <xsl:value-of select="."/>
            </xsl:for-each>
</xsl:template>
 
</xsl:stylesheet>
 
Having got rid of the silly position() predicate, the performance of the
"small memory" stylesheet was about 100 times better. Making the same change
to the "big memory" stylesheet improved performance by about 50 times. This
presumably reflects the extra cost of using the Xalan nodeset() function.
 
I am now happy with the performance of the "small memory" stylesheet when it
is written sensibly, but I do not really understand why doing silly processing
against a nodeset created from a call to xalan:nodeset() seems to run about 20
times quicker than the same silly processing against a nodeset variable
created by the select="..." attribute of <xsl:variable>. Is it something to do
with whether my nodeset variable uses a NodeIterator, Nodelist or NodeVector
under the bonnet? Am I accessing the nodes sequentially in one case and
positionally in the other? Am I missing something fundamental about how
predicates work?
 
I have been running the above through Stylus Studio using Xalan 2.7.0 and
through an IBM java 1.5 jvm also using Xalan 2.7.0. My version of XSLT is 1.0.
I've allocated my jvm between 64mb and 500mb of memory at various stages of
testing, and the production IBM java 1.5 jvm which blew had 1.5 gb, and was
running java code compiled at 1.4.2 .
 
Any help would be greatly appreciated!
 
Pete Taylor
 
_________________________________________________

AXA UK IT
Pete Taylor
IT Solution Consultant
AXA, Ballam Road (ABC Block), Lytham, FY8 4TQ
Tel: +44 (0)1253 683398 (internal - 741 3398)
E-mail: peter.j.taylor@xxxxxxxxxxxxxxxxxxx
 
Make tea, not war.
_________________________________________________
 

This email originates from AXA Services Limited (reg. no. 446043)
which is a service company for AXA UK plc (reg. no. 2937724) and
the following companies within the AXA UK plc Group:
AXA Insurance Plc (reg. no. 932111)
AXA Insurance UK Plc (reg. no. 78950)
AXA General Insurance Limited (reg. no. 141885)

All of the above mentioned companies are registered in England and
have their registered office at 5 Old Broad Street, London EC2N 1AD,
England. AXA Insurance UK plc is authorised and regulated by the
Financial Services Authority.

This message and any files transmitted with it are confidential and
intended solely for the individual or entity to whom they are addressed.
If you have received this in error, you should not disseminate or copy
this email. Please notify the sender immediately and delete this email
from your system.

Please also note that any opinions presented in this email are solely
those of the author and do not necessarily represent those of The AXA
UK Plc Group. Email transmission cannot be guaranteed to be secure, or
error free as information could be intercepted, corrupted, lost,
destroyed, late in arriving or incomplete as a result of the transmission
process. The sender therefore does not accept liability for any errors or
omissions in the contents of this message which arise as a result of
email transmission.

Finally, the recipient should check this email and any attachments for
viruses. The AXA UK Plc Group accept no liability for any damage
caused by any virus transmitted by this email.

Current Thread