RE: [xsl] use-when attribute?

Subject: RE: [xsl] use-when attribute?
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Sun, 19 Dec 2004 22:52:23 -0000
> One stylesheet defines a global variable select="//a|//b|//c" 
> and another similarly a key (or three 
> with the same name) with match="a|b|c" use="'all'"..
> 
> The root template outputs a root node containing the count of 
> either the global variable (test case 
> 1) or of the key (test case 2).

Consider count(//a|//b|//c).

A completely naive implementation will build three node-sets in memory, the
result of //a, //b, and //c respectively, will then combine these into a
single node-set in memory, sorting and eliminating duplicates as it goes,
and will then count the number of nodes in the combined node-set.

But there are many fairly easy optimizations possible:

(a) the expression //a|//b|//c can be rewritten as //*[self::a or self::b or
self::c]. This then performs a single scan of the document, with no need to
eliminate duplicates

(b) count() doesn't need to materialize the node-set in memory, it can count
the nodes as they are found (this is called pipelining)

(c) the system can recognize that expressions such as //a deliver results in
sorted order without doing an explicit sort, and the union operation (|)
between such node-sets can be done by a merge operation without any need for
a sort.

(d) A union operation can be pipelined: there is no need to materialize //a
and //b in memory in order to form their union.

(e) The system can recognize that //a and //b are disjoint, so
count(//a|//b) is the same as count(//a) + count(//b)

In short, it isn't hard for a system to evaluate this expression in a single
scan, or perhaps three scans, and there is no need to allocate any memory
for temporary results.

Even if the result of //a|//b|//c is assigned to a variable and you then do
count() on the variable, the system may be able to tell that it doesn't need
to allocate memory to hold the value (it might be able to see that it's only
used once).

Now consider the solution using keys. If you define a key using xsl:key, and
then use it in the key() function, you're declaring a fairly clear intent
that you want to build some kind of index to make repeated calls on key()
faster. Building an index is an expensive operation: it's only worth doing
if you are going to use it repeatedly. If you only use it once, it will
almost certainly be slower than doing the search directly. It's unlikely
that the system will notice that it's not worth building the index, because
xsl:key is intended as an explicit performance hint and the system would
usually assume that if the user says they want an index, they mean it.
> 
> Despite the explanation from Mike, I still don't _exactly_ 
> understand why the key solution is so 
> much slower in this case (it doesn't fit with other 
> experiences with keys in which a reached a large 
> gain by using them), but it is at least very obvious that it 
> is _not_ wise to use a key with a fixed 
> use pattern...
> 

The main reason the key is slower in this case is that you are only using it
once. But there's no good reason for ever using a key with a fixed use
expression more than once, because you could always put the result of the
first evaluation in a variable. So yes, I can't see any reason for using
this construct.

Michael Kay
http://www.saxonica.com/

Current Thread