RE: [xsl] Testing by counting or positional predicate

Subject: RE: [xsl] Testing by counting or positional predicate
From: Kay Michael <Michael.Kay@xxxxxxx>
Date: Thu, 11 Jan 2001 12:42:01 -0000
> So, I was wondering: which is more efficient for testing whether a
> $subset node set is a subset of a $superset node set?
> 
>   count($subset|$superset) = count($superset)
> 
> or:
> 
>   not(($subset|$superset)[count($superset) + 1])
>
In Saxon, I suspect both of these will be dominated by the union operation,
which is the same in both cases. Both union and count() force the node-set
to be sorted into document order and de-duplicated. Unlike not(), which
merely tests whether it is empty. 

In principle both count($X) and not($X[n]) could be done by counting
distinct elements without actually sorting, but in practice Saxon doesn't
have a separate "count distinct" operation, it's simpler to do the sort and
the de-duplication as one operation.

If both node-sets are already sorted then the count(A|B)=count(B)
formulation will be slightly faster, when the node-sets are large, because
it just compares two integers whereas not((A|B)[N]) actually iterates over
the elements until it finds the Nth or reaches the end, whichever happens
first. There's scope for an optimisation here, the only thing that requires
care is that it only works when N is an expression that has no dependency on
the context node or context position. (I got tripped up once with something
like $A[count(//*)]: the value of count(//*) is the number of elements in
the document containing the context node, which may be different for
different nodes in $A).

There are cases where the not() formulation will be faster, because it is
pipelined, but I think this will only happen where the cardinality of
$subset|$superset is significantly larger than the cardinality of $superset,
which I would think is an unusual scenario.


 
> Does this extend to testing for identity or is the unioned set so
> small that it doesn't make any odds?
> 
>   count($node1|$node2) = 1
> 
> or:
> 
>   not(($node1|$node2)[2])
>   
If both node-sets are singletons then there's going to be no noticeable
difference.

Mike Kay 

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread