RE: XSL Performance

Subject: RE: XSL Performance
From: Kay Michael <Michael.Kay@xxxxxxx>
Date: Tue, 6 Jun 2000 14:15:12 +0100
> | 1. Keep the source documents small. If necessary split the document
> | first.
> | [...]
> | 8. Split complex transformations into several stages.
> I find these two tips interesting. What, exactly, do you mean with the
> first one? Should one split big documents into several smaller pieces
> and then process each piece in turn? If so, why?

Mainly to avoid thrashing virtual memory and to reduce the burden on the
garbage collector. This typically becomes important with documents above say
3Mb. I suspect that many transformations on large documents have a serial
nature: the n'th "chunk" of the output depends only on the n'th "chunk" of
the input. A nice idea is to write a SAX filter which presents each "chunk"
to the XSLT processor in turn, as if it were the whole document. 
> Also, is the intention behind 8. to simplify the transformations and
> thereby make them more efficient?

Yes. Can't really quantify this, but the thinking is that some operations
like grouping and nnumbering and sorting can be quite complex, and if you
try and do them all at the same time you're much more likely to end up with
n-squared algorithms. For example adding sequence numbers to the nodes in a
pre-pass (using position()) may be much more efficient that calculating the
numbers later using <xsl:number/>.

> | 8. To output the text value of a simple #PCDATA element, use
> |    <xsl:value-of> in preference to <xsl:apply-templates>.
> I suppose using xsl:for-each instead of xsl:apply-templates is another
> example of bascially the same technique.
Possibly, though I doubt the difference is as big. I'd guess that the
pathlength (and more importantly, the number of objects created) when using
<xsl:apply-templates/> to process a simple PCDATA element using the built-in
text() template is five to ten times that of using <xsl:value-of/>. That's
my estimate for Saxon, anyway: I'd be suprised if it's different for other
processors unless they optimise this as a special case.

But in publishing this list, I was sticking my neck out. I can't justify all
the assertions, I just thought that people might be interested in my guesses
even if they're wrong.

Mike K

 XSL-List info and archive:

Current Thread