Re: [xsl] Speeding up processing (with sablotron or saxon)

Subject: Re: [xsl] Speeding up processing (with sablotron or saxon)
From: "TDarksword" <tdarksword@xxxxxxxxxxxx>
Date: Tue, 13 Jul 2004 15:57:44 +0100
----- Original Message ----- 
From: "Wendell Piez" <wapiez@xxxxxxxxxxxxxxxx>
To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Tuesday, July 13, 2004 12:03 AM
Subject: Re: [xsl] Speeding up processing (with sablotron or saxon)


> Hi,
>
> At 01:33 PM 7/12/2004, you wrote:
> >ok I have a piece of XSLT that processes a large XML file into smaller
> >chunks. The problem I have is that the deeper down into the XML file I am
> >processing the longer it takes. Is this just due to the way XSLT parsers
> >work or can I tweak my XSL file so it processes faster?
> >
> >I get the same effect when I used to process the file as one pass using
> >Saxon Result:document as I do processing as seperate XSL files with
either
> >Saxon or Sablotron.
> >
> >
> >This is the seperate file XSL file:- (Change the server[@name='Ahazi'] as
> >needed)
> ><?xml version="1.0"?>
> ><xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
> >version="1.0">
> ><xsl:output method="xml" indent='yes' encoding="utf-8"/>
> >
> ><xsl:template match="server" />
> ><xsl:template match="server[@name='Ahazi']">
> ><resources>
> ><xsl:for-each
> >select=".//resource[not(@swgcraft_id=preceding::*/@swgcraft_id)]">
>
> ... this for-each is expensive. You are traversing the entire document
> looking for 'resource' elements; each one you find is examined by looking
> at all its preceding elements and comparing their @swgcraft_id attributes.
> When you have lots of elements, lots and lots of them are compared. (n^2
> performance.)
>
> Since this happens every time the template is matched (which could itself
> be lots of times), it adds up -- especially for the later nodes in your
set
> (as you noticed).
>
> An easy tweak to improve performance would be to use keys to de-duplicate
> instead of doing it by hand on the preceding:: axis.
>
> So:
>
> <xsl:key name="resource-by-id" match="resource" use="@swgcraft_id"/>
>
> <xsl:variable name="resources" select="//resource"/>
> (binding //resource to a variable $resource so we don't have to retrieve
it
> every single time)
>
> then you can deduplicate in another variable declaration:
>
> <xsl:variable name="unique-resources"
>     select="$resources[not(count(.|key('resources-by-id',@swgcraft_id)[1])
> = 1)]"/>
>
> In English: $unique-resources is the collection of all resources which,
> when counted along with the first resource with the same swqcraft_id as
> themselves, amount to a single node (which is true only of the first one
> with each swgcraft_id).
>
> This ought to help quite a bit.
>
> Cheers,
> Wendell
>

So I'd replace the:-
<xsl:for-each
select=".//resource[not(@swgcraft_id=preceding::*/@swgcraft_id)]">

with

<xsl:key name="resource-by-id" match="resource" use="@swgcraft_id"/>
<xsl:variable name="resources" select="//resource"/>
<xsl:variable name="unique-resources"
     select="$resources[not(count(.|key('resources-by-id',@swgcraft_id)[1])
= 1)]"/>

but I guess I still need some form of for-each statement too?

TIA Tony

Current Thread