RE: [xsl] Optimizing XSLT iteration

Subject: RE: [xsl] Optimizing XSLT iteration
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Mon, 8 Oct 2007 04:59:51 +0100
Thanks for the further data (sent offlist).

Your chart seems to show quadratic performance.

Your 10mb transformation takes 29secs on my machine; that's 0.3Mb/sec
compared with 1.5Mb/sec for the smaller data file, which is consistent with
the idea of it being quadratic, as far as one can tell from two data points.


Success: I've found the cause by looking at the Java CPU profile. The
culprit is the calls on xsl:number at line 42 and 128. There are some
circumstances in which Saxon optimizes xsl:number, but where it can't do so,
it does essentially what the spec describes: each call on xsl:number has to
count all the previous elements, which immediately gives you O(n^2)
performance.

You're using xsl:number to generate a unique id. I don't know what the
constraints are on this id, but you can probably solve the problem using
generate-id() or position(). If it absolutely must be the sequential number
of the locus element within the entire file, consider using a prepass that
numbers the elements (attaching the number as an extra attribute).

It's probably time I looked again at better optimizations for xsl:number -
this hasn't been touched in a long time. The current optimization
essentially looks at the previous execution of the same xsl:number
instruction, and if that was numbering the previous node, it adds one to the
previous number. This doesn't kick in in your case because you are only
numbering every fifth node.

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: Michael Kay [mailto:mike@xxxxxxxxxxxx] 
> Sent: 08 October 2007 03:03
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: RE: [xsl] Optimizing XSLT iteration
> 
> Thanks for sending your sample data (offlist).
> 
> I've fixed various errors in your source file to make it well-formed.
> Running the transformation 100 times in Saxon I get
> 
> *** Average execution time over 100 runs: 26ms
> 
> which seems entirely reasonable.
> 
> It's dangerous to scale up from something quite so small, but 
> 38Kb in 26ms scales up to 76Mb in 52s if the performance is 
> linear, and I can't really see why it shouldn't be. The 
> processing in this stylesheet is a bit more complex than the 
> previously posted sample (includes string-to-number 
> conversion and arithmetic) but it looks linear.
> 
> Michael Kay
> http://www.saxonica.com
>   
> 
> > -----Original Message-----
> > From: Michael Kay [mailto:mike@xxxxxxxxxxxx]
> > Sent: 08 October 2007 01:49
> > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> > Subject: RE: [xsl] Optimizing XSLT iteration
> > 
> > > I just tried Saxon parser after increasing the initial/max
> > Java heap
> > > space to 512M and it worked with even a 74MB file, but took
> > about 28
> > > minutes to finish.
> > 
> > That does seem quite high, given that Saxon can do an identity 
> > transform on a file of that size in something close to 28 
> seconds. I 
> > wonder if there is some other problem in your code (are there parts 
> > you haven't shown us?). Is the performance linear with 
> document size?
> > 
> > Michael Kay
> > http://www.saxonica.com/

Current Thread