[xsl] Performance of using predicates vs key function in a large scale xml problem.

Subject: [xsl] Performance of using predicates vs key function in a large scale xml problem.
From: "Yang" <sfyang@xxxxxxxxxxxxx>
Date: Tue, 3 Jul 2001 18:29:50 +0800
Hi,
I pick up a lot of xslt knowledge from xsl.list.  It may be a fqa problem
get the associated output
based on the given $id condition  through using a predicate something like
as;

<copy>
<copy-of select="$source[@id = $id]"/>
</copy>

This simple pattern can provide a perfect solution for a small scale of
problem.
However its speed performance becomes dramatically slower once applying it
to the
large scale of problem of thousand records.

I have recorded a comparison between using the predicate and using key
elements and presented in the list(msg01066.html),
and found out using key function is a much better solution.
Jeni feed back her favorable opinion in using a key on the same document
*multiple* times (msg01072.html).

Now I am going to share another real case with some of you interested and
hopefully to get your expert opinion.
The case is involved a.xml with about 2000 z:row records and b.xml with same
size of a.xml. The task is
1.  normalize-space of each attributes of each z:row in a.xml
2.  get a copy of attributes from b.xml and add them to a.xml based on the
common saleorderno attribute value.

First, using the predicates shown below, the process time is very slower.

<xsl:template match="@SalesOrderNo">
<xsl:variable name="sno" select="normalize-space(.)"/>
<xsl:attribute name="SalesOrderNo">
<xsl:value-of select="$sno"/>
</xsl:attribute>
<!-- using predicate is unacceptable slow  when comparing with
payment-customer2.xsl where the key function is used instead-->
<xsl:apply-templates
select="$MSource[normalize-space(@SalesOrderNo)=$sno]/@CustomerCode"
mode="merge"/>
</xsl:template>


So change to key solution by using following major steps:
1  Develop a more direct relation  from b.xml

<xsl:variable name="aa" >
<xsl:for-each select="$MSource">
<z:row>
<xsl:apply-templates select="@SalesOrderNo|@CustomerCode|@CustomerName"
mode="merge"/>
</z:row>
</xsl:for-each>
</xsl:variable>
<xsl:variable name="originalDoc" select="msxsl:node-set($aa)"/>

2.  Apply the key function

<xsl:for-each select="$originalDoc">
<xsl:variable name="kk" select="key('salesorderno',$sno)"/>
<xsl:attribute name="CustomerCode">
<xsl:value-of select="$kk/@CustomerCode"/>
</xsl:attribute>
<xsl:attribute name="CustomerName">
<xsl:value-of select="$kk/@CustomerName"/>
</xsl:attribute>
</xsl:for-each>
</xsl:template>

The speed to get the final solution is much faster.

Therefore it convinces to me to handle a large quantity of records, it will
be worthwhile to
know more about using key function and the data scope of current node,
rather than a simple
and easy understandable predicate pattern.

The complete listing is attached below.

sfyang


sfyang@xxxxxxxxxxxxx

<?xml version="1.0" encoding="big5"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
xmlns:s="uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882"
xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:rs="urn:schemas-microsoft-com:rowset" xmlns:z="#RowsetSchema"
exclude-result-prefixes="s  dt  msxsl rs  z">

<xsl:output method="xml" indent="yes"/>
<xsl:key name="salesorderno" match="z:row" use="@SalesOrderNo"/>
<xsl:variable name="MSource" select="document('b.xml')//z:row"/>

<xsl:variable name="aa" >
<xsl:for-each select="$MSource">
<z:row>
<xsl:apply-templates select="@SalesOrderNo|@CustomerCode|@CustomerName"
mode="merge"/>
</z:row>
</xsl:for-each>
</xsl:variable>
<xsl:variable name="originalDoc" select="msxsl:node-set($aa)"/>

<xsl:template match="/">
<xsl:variable name="rtf-zs">
<xsl:apply-templates select="//z:row" mode="n" />
</xsl:variable>
<xsl:variable name="zz" select="msxsl:node-set($rtf-zs)/z:row"/>
zz:<xsl:copy-of select="$zz"/>
</xsl:template>

<xsl:template match="z:row" mode="n">
<z:row>
 <xsl:apply-templates select="@*|node()"/>
 <xsl:apply-templates select="@SalesOrderNo"/>
</z:row>
</xsl:template>

<xsl:template match="@*">
<xsl:attribute name="{name()}">
<xsl:value-of select="normalize-space(.)"/>
</xsl:attribute>
</xsl:template>


<xsl:template match="@SalesOrderNo">
<xsl:variable name="sno" select="normalize-space(.)"/>
<xsl:attribute name="SalesOrderNo">
<xsl:value-of select="$sno"/>
</xsl:attribute>

<xsl:for-each select="$originalDoc">
<xsl:variable name="kk" select="key('salesorderno',$sno)"/>
<xsl:attribute name="CustomerCode">
<xsl:value-of select="$kk/@CustomerCode"/>
</xsl:attribute>
<xsl:attribute name="CustomerName">
<xsl:value-of select="$kk/@CustomerName"/>
</xsl:attribute>
</xsl:for-each>
</xsl:template>

<xsl:template match="@*" mode="merge">
<xsl:attribute name="{name()}">
<xsl:value-of select="normalize-space(.)"/>
</xsl:attribute>
</xsl:template>

</xsl:stylesheet>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread