Re: [xsl] Optimizing XSLT iteration

Subject: Re: [xsl] Optimizing XSLT iteration
From: "Sujata Gohad" <sgohad@xxxxxxx>
Date: Sun, 7 Oct 2007 21:13:18 -0700
Hi Mike,

Very useful insights. Thanks a lot for taking pains to look at this in
that detail! :-)

I will try to rework the said parts of the XSLT. Do you expect to see
O(n) numbers once the optimization is in place? Just wondering.

Also, what tools are your favorite for Java profiling and/or XSLT
profiling. It will be very useful to know what the Saxon almighty
himself uses. :-)

Many thanks,
- Sujata



On 10/7/07, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> Thanks for the further data (sent offlist).
>
> Your chart seems to show quadratic performance.
>
> Your 10mb transformation takes 29secs on my machine; that's 0.3Mb/sec
> compared with 1.5Mb/sec for the smaller data file, which is consistent with
> the idea of it being quadratic, as far as one can tell from two data points.
>
>
> Success: I've found the cause by looking at the Java CPU profile. The
> culprit is the calls on xsl:number at line 42 and 128. There are some
> circumstances in which Saxon optimizes xsl:number, but where it can't do so,
> it does essentially what the spec describes: each call on xsl:number has to
> count all the previous elements, which immediately gives you O(n^2)
> performance.
>
> You're using xsl:number to generate a unique id. I don't know what the
> constraints are on this id, but you can probably solve the problem using
> generate-id() or position(). If it absolutely must be the sequential number
> of the locus element within the entire file, consider using a prepass that
> numbers the elements (attaching the number as an extra attribute).
>
> It's probably time I looked again at better optimizations for xsl:number -
> this hasn't been touched in a long time. The current optimization
> essentially looks at the previous execution of the same xsl:number
> instruction, and if that was numbering the previous node, it adds one to the
> previous number. This doesn't kick in in your case because you are only
> numbering every fifth node.
>
> Michael Kay
> http://www.saxonica.com/
>
> > -----Original Message-----
> > From: Michael Kay [mailto:mike@xxxxxxxxxxxx]
> > Sent: 08 October 2007 03:03
> > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> > Subject: RE: [xsl] Optimizing XSLT iteration
> >
> > Thanks for sending your sample data (offlist).
> >
> > I've fixed various errors in your source file to make it well-formed.
> > Running the transformation 100 times in Saxon I get
> >
> > *** Average execution time over 100 runs: 26ms
> >
> > which seems entirely reasonable.
> >
> > It's dangerous to scale up from something quite so small, but
> > 38Kb in 26ms scales up to 76Mb in 52s if the performance is
> > linear, and I can't really see why it shouldn't be. The
> > processing in this stylesheet is a bit more complex than the
> > previously posted sample (includes string-to-number
> > conversion and arithmetic) but it looks linear.
> >
> > Michael Kay
> > http://www.saxonica.com
> >
> >
> > > -----Original Message-----
> > > From: Michael Kay [mailto:mike@xxxxxxxxxxxx]
> > > Sent: 08 October 2007 01:49
> > > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> > > Subject: RE: [xsl] Optimizing XSLT iteration
> > >
> > > > I just tried Saxon parser after increasing the initial/max
> > > Java heap
> > > > space to 512M and it worked with even a 74MB file, but took
> > > about 28
> > > > minutes to finish.
> > >
> > > That does seem quite high, given that Saxon can do an identity
> > > transform on a file of that size in something close to 28
> > seconds. I
> > > wonder if there is some other problem in your code (are there parts
> > > you haven't shown us?). Is the performance linear with
> > document size?
> > >
> > > Michael Kay
> > > http://www.saxonica.com/

Current Thread