Re: [xsl] XPath "//", speed, and Saxon

Subject: Re: [xsl] XPath "//", speed, and Saxon
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Fri, 31 Oct 2008 13:23:57 -0400
Lars,

I will let Mike report any relevant specifics regarding details of Saxon's internal optimizations, but it does sound like what you've discovered is that performance can vary radically depending on the processor used.

Even using Xalan, however (if that's what you're using in Cocoon), you might well find that an optimization of the form

<xsl:variable name="every-foo" select="//foo"/>

and then references to $every-foo instead of literal "//foo" in your code, would work equally or nearly as well as using a key. Depending on the details of what is actually done with these nodes, of course.

In general, your observation also demonstrates that what's good practice for one situation may not be good practice for another. Sometimes you know exactly which processor will always be used with your stylesheet; other times you don't. Sometimes your stylesheet will be maintained by others, and needs to be able to adapt over time; other times, it will be a black box. Performance is critical sometimes, but not always. Sometimes the stylesheet is developed to run only once. All of these can affect the determining factors of "good practice". In effect, good practice means striking the right balance for the right situation.

Just as Solon told Croesus that no man should be judged happy until after he's dead, similarly, no stylesheet should be judged "good" until after it's retired.

Of course, this doesn't mean that your question isn't a concern. If only out of habit, I'd probably not write document-wide traversals like "//foo" except under controlled circumstances -- while also not worrying too much about performance constraints except when I actually face them.

I also probably wouldn't write to particular processors' optimizations except when reasonably certain that I could rely on them.

Cheers,
Wendell

At 10:27 AM 10/31/2008, you wrote:
Hello,

I was recently trying to solve performance problems in an XSLT-heavy web
application, and came up against results that puzzled me with regard to
XSLT optimization.

We have a Cocoon pipeline in which about 5MB of XML data is being fed
through a particular XSLT stylesheet (one in a series). And I thought
that this stylesheet was the reason for the pipeline taking forever to
run. I looked in it and found several uses of XPaths containing an
initial double-slash, e.g. select="//foo", some of them being invoked
multiple times.

I figured that for a simple XSLT processor, each "//foo" expression
could mean traversing the whole input DOM again, which would be
expensive for a big input.

So I went through and converted the "//foo" expressions to use keys.
Excited at how much faster I expected the stylesheet to run, I ran some
tests ... pretty fast. The process completed in just under 2 seconds.
But then I ran an apples-to-apples test on the old version of the
stylesheet, the one with lots of "//foo" in it. And to my surprise, the
old version ran just as fast. After several test runs I could see no
appreciable difference in speed.

Obviously the performance problem was elsewhere. But the question I
wanted to ask here is, what does this imply regarding good practices for
writing efficient stylesheets?

Saxon of course is not a dumb XSLT processor. Maybe it compiles the
"//foo"-like XPath expressions into something like keys without being
told to... e.g. it indexes the DOM tree by element name... and so you
get good performance with those expressions even on large inputs.

If so, does that optimization rely on the name of the element, so that
it would not apply to expressions like "//*[...]"? That would suggest
that for "//foo"-like expressions, you're in good shape, but for
expressions like "//*" you should use a key for efficiency.

Thanks for any help and advice.


======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

Current Thread