RE: [xsl] XPath "//", speed, and Saxon

Subject: RE: [xsl] XPath "//", speed, and Saxon
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Fri, 31 Oct 2008 16:45:09 -0000
Unfortunately it's very difficult to give good performance advice that
applies across all processors.

There are several things Saxon does with leading // that are relevant.

Firstly, if you're running schema-aware and the type of the context node is
known, Saxon-SA will rewrite //z as /a/b/c/d/z if it can. It can't always,
of course, for example if the structure is recursive. (As it happens this
isn't always a good optimization - it's good when the z elements are few and
localized, but bad when they are many and can appear anywhere.)

Secondly, //z is rewritten as /descendant::z if there's no positional
predicate.

Finally, for any given document, /descendant::z is implemented as a memo
function: the first time you execute it (for a particular choice of z and a
particular document) the document is scanned, but the result is retained and
is reused if you use the same expression again.

It's also worth pointing out that /descendant::z is very fast on the
tinytree anyway. Even if you've got 500,000 nodes in your document, it
doesn't take very long to scan an array of 500,000 integers and test each
one for equality to some constant. Sure, it's linear with the size of the
document, but the actual search speed per megabyte is probably 1000 times
faster than parsing or serializing.

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: Lars Huttar [mailto:huttarl@xxxxxxxxx] 
> Sent: 31 October 2008 14:28
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] XPath "//", speed, and Saxon
> 
> Hello,
> 
> I was recently trying to solve performance problems in an 
> XSLT-heavy web application, and came up against results that 
> puzzled me with regard to XSLT optimization.
> 
> We have a Cocoon pipeline in which about 5MB of XML data is 
> being fed through a particular XSLT stylesheet (one in a 
> series). And I thought that this stylesheet was the reason 
> for the pipeline taking forever to run. I looked in it and 
> found several uses of XPaths containing an initial 
> double-slash, e.g. select="//foo", some of them being invoked 
> multiple times.
> 
> I figured that for a simple XSLT processor, each "//foo" 
> expression could mean traversing the whole input DOM again, 
> which would be expensive for a big input.
> 
> So I went through and converted the "//foo" expressions to use keys.
> Excited at how much faster I expected the stylesheet to run, 
> I ran some tests ... pretty fast. The process completed in 
> just under 2 seconds.
> But then I ran an apples-to-apples test on the old version of 
> the stylesheet, the one with lots of "//foo" in it. And to my 
> surprise, the old version ran just as fast. After several 
> test runs I could see no appreciable difference in speed.
> 
> Obviously the performance problem was elsewhere. But the 
> question I wanted to ask here is, what does this imply 
> regarding good practices for writing efficient stylesheets?
> 
> Saxon of course is not a dumb XSLT processor. Maybe it 
> compiles the "//foo"-like XPath expressions into something 
> like keys without being told to... e.g. it indexes the DOM 
> tree by element name... and so you get good performance with 
> those expressions even on large inputs.
> 
> If so, does that optimization rely on the name of the 
> element, so that it would not apply to expressions like 
> "//*[...]"? That would suggest that for "//foo"-like 
> expressions, you're in good shape, but for expressions like 
> "//*" you should use a key for efficiency.
> 
> Thanks for any help and advice.
> 
> Lars

Current Thread