Re: [xsl] XPath expression to check that there are no intervening elements?

Subject: Re: [xsl] XPath expression to check that there are no intervening elements?
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 20 Jul 2016 15:36:29 -0000
> On 20 Jul 2016, at 09:36, Michael Kay mike@xxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> I did some timings:
>
>
> Kay 3 - 32.18ms (ouch!)
>
> doc ! (let $fsm := map{
> 0: function($x) {if ($x[self::B]) then 1 else 0},
> 1: function($x) {if ($x[self::B]) then 1 else 2},
> 2: function($x) {if ($x[self::B]) then 3 else 2},
> 3: function($x) {3}
> } return
> (fold-left(*, 0, function($state, $node){$fsm($state)($node)}) ne 3))
>


I decided to do some analysis of this disappointing result.

It turns out that dynamic function calls are quite expensive, and this is
almost entirely due to the very dynamic nature of the type checking that is
done (so-called function coercion).

Running the above in a slightly different test environment I got a baseline
time of 41ms.

Java profiling showed an inordinate amount of time spent computing hashcodes.
It turned out that during the dynamic call of the various "function($x)"
functions, Saxon was checking that the supplied item type element(C) was a
subtype of the required type item()*, which it does by first checking this
pair of types in a cache of known type relationships, and the lookup in the
cache involved an expensive hashcode computation. Avoiding the lookup by
applying the knowledge that everything is a subtype of item()* reduced the
total time to 29ms.

I achieved a further reduction to 22ms by replacing the dynamic function call
$fsm($state) with a static call map:get($fsm, $state).

I then got this down to 13ms by avoiding the inline functions for computing
the state transition, changing the query to

declare namespace map='http://www.w3.org/2005/xpath-functions/map';
doc ! (let $fsm := map{
0: (1,0),
1: (1,2),
2: (3,2),
3: (3,3)}

return (
  fold-left(*, 0, function($state, $node){
             map:get($fsm,$state)[if ($node[self::B]) then 1 else 2]
             }) ne 3))

Even though there is now only one dynamic function call per element in the
input sequence (and this is unavoidable) the cost still seems to be dominated
by dynamic function calling.

Declaring the types of the arguments ($state, $node) made this worse, because
it increases the amount of dynamic type checking that needs to be done. This
is quite different from the usual scenario where declaring types enables the
checking to be done statically.

There is clearly a lot of scope for optimization in this area. It's actually
the first time I've looked seriously at the performance of higher order
functions in a real query.

Michael Kay
Saxonica

Current Thread