RE: [xsl] Wath is the opposite of the union operator?

Subject: RE: [xsl] Wath is the opposite of the union operator?
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Thu, 22 Sep 2005 17:03:20 +0100
> You can expensively code 
> A except B =>
>    A[count(.|B) != count(B)]
> A intersect B =>
>    A[count(.|B) = count(B)]
> Hi, Mike,
> If you don't mind waxing theoretical for a minute, what makes those 
> comparisons expensive?

Let's stick to "except", the same applies to "intersect".

It's reasonably easy for an optimizer to do the simple rewrite

let $c := count(B)
return A[count(.|B) != $c]

Saxon certainly does this, other processors might not. If B is a complex
expression with no dependency on the context node, Saxon will also evaluate
B outside the loop, so this reduces to

let $b := B
let $c := count(B)
return A[count(.|$b) != $c]

But that still leaves count(.|$b) inside the loop, to be evaluated once for
each item in A. For Saxon, this means scanning the node-set $b once for each
item in $a, giving overall performance O(count(A)*count(B)). By contrast,
the union, except, and intersect operators in XPath 2.0 are evaluated in
Saxon using a sort-merge strategy, giving performance of the order
O(N*log(N) + M*log(M)) where N is count(A) and M is count(B). In fact, the
sort will in most cases not be necessary since the node-sequence will often
be in document order already, giving performance O(count(A)+count(B)).

The sort-merge approach is also more efficient in memory. When the
expressions A and B deliver nodes in document order, all three operators can
be evaluated without breaking the pipeline: that is, it is not necessary to
allocate memory to hold the values of their operands. Memory allocation (and
de-allocation, through garbage collection) accounts for a significant part
of XSLT/XPath execution cost.

Of course, it's always possible that a different XPath 1.0 processor might
recognize these constructs specially, and rewrite them to give performance
equivalent to the 2.0 operators.

Michael Kay

Current Thread