RE: [xsl] RE: Re: . in for

Subject: RE: [xsl] RE: Re: . in for
From: "Michael Kay" <michael.h.kay@xxxxxxxxxxxx>
Date: Mon, 7 Jan 2002 15:59:07 -0000
> We were discussing with Jeni the problem for optimisation if
> certain functions like
> position() (or any other function, including user-defined
> ones, that uses the result
> of the inner for as a sequence) are used in an outer 'for'
> and the inner 'for'
> decreases the cardinality of its sequence by producing one or
> more empty sequences
> (). This is fully described in:
>
> http://aspn.activestate.com/ASPN/Mail/Message/xsl-list/968103
>
> It is interesting to hear your solution -- where were Jeni
> and I wrong?
>
I've found it difficult to follow all the details of the conversation, in
particular, the optimization that's being attempted.

The situation with position() and last() is (at present) that the "for"
expression doesn't set these (it doesn't change the focus at all). But I
don't think it would be a problem if it did. It would be no different from
the situation with path expressions. Where you get a filter expression like

$x[position()>3][1]

then you can re-order (or combine) the predicates if they are
non-positional, but you can't reorder them if they are positional. They are
positional if they use the position() or last() functions, explicitly or
implicitly. A similar rule would apply to nested "for" expressions.

This doesn't stop the implementation being pipelined. Given the expression
above, Saxon will stop as soon as it finds the first item in $x whose
position() is >3: it never evaluates the full sequence $x[position()>3] and
it doesn't need to store this intermediate result in memory. The same is
true for "for" expressions.

Of course an important potential optimisation for "for" expressions (and
"some" and "every") is to optimize joins using hash tables or the like.
Saxon doesn't currently attempt this - it always does a "nested loop"
evaluation. It could be done: it's largely a question of (a) identifying
commonly-used constructs where optimisation is possible, and then (b)
identifying the constraints that have to be satisfied for optimisation to be
safe. The constraints would almost certainly include not using position()
and last(). One thing to look at would be reducing path expressions, "for"
expressions, and "some" and "every", to some kind of canonical internal
format that enables the optimizations to be recognized more easily: for
example

  exists(for $x in X return f($x))

is (I think!) equivalent to

  some $x in X satisfies exists(f($x))

The XQuery formal algebra, when it comes out, will help in defining such
equivalences.

Mike Kay



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread