[xsl] Postional predicates de-mystified

Subject: [xsl] Postional predicates de-mystified
From: "Evan Lenz" <elenz@xxxxxxxxxxx>
Date: Thu, 3 Jan 2002 10:23:18 -0800
Dave Pawson requested that I provide a "Jeni Tennison explanation" of this
business about XPath 1.0 positional predicates and forward and reverse axes,
for inclusion in the FAQ. Not to be outdone, here goes:


Positional predicates in XPath 1.0 can be confusing. If you see "[3]" in an
expression, it might select the third node in *document order*, or it might
select the third node in *reverse document order*. How do you know which?
Let's try a brief quiz. For each of the expressions below, does "[3]" select
the first node in (forward) document order, or reverse document order?

foo[3]
ancestor::foo[3]
following::foo[3]
preceding::foo[3]

If you answered forward/reverse/forward/reverse, then you'd be right! The
(default) child axis is used in "foo[3]" and since child is a forward axis,
"[3]" selects the third node in document order. The ancestor axis in
"ancestor::foo[3]" is a reverse axis; consequently, that expression selects
the third node in *reverse document order*. The same principle applies for
the following (forward) and preceding (reverse) axes.

Now try these:

$var[3]
(foo | bar)[3]
(ancestor::foo)[3]
id('foo')[3]

Hmm, how do we know which axes to use (forward or reverse) for these? The
answer may surprise you. It is: ALWAYS FORWARD. Why? Because these examples
represent a different kind of predicate! In XPath 1.0, predicates can be
part of a location step (as in the first quiz's examples) and they can also
be applied to any expression (as in the second quiz's examples). When a
positional predicate is applied to an *expression* (as opposed to being part
of a location step), it is always evaluated with respect to forwardness,
which is to say that it always filters nodes with respect to document order.

This rule applies even in the third example above:

(ancestor::foo)[3]

But, you might say, the ancestor axis is a reverse axis! And you'd be right,
but it doesn't matter in this case! The predicate doesn't care what axis was
used here because it is not part of the location step; rather it filters an
expression. The parentheses ensure that "ancestor::foo" is first evaluated
as an expression in its own right (without a predicate), yielding an
unordered node-set. Only after that is the predicate applied to the result
of that expression, selecting the third node in document order. Such a big
difference those little parentheses can make!

Hopefully the above examples have helped clear up some confusion, but if
you'd like a slightly more technical explanation (and review) of how this
all works, keep on reading.

The relevant prose in the XPath 1.0 spec can be found at [URI1] and [URI2].

"2.4 Predicates"[URI1] explains the difference between "forward" and
"reverse" axes and how predicates are always evaluated with respect to a
forward or reverse axis. The distinction only makes a difference to the
expression result when the predicate is a positional predicate, i.e. when
the predicate expression evaluates to a number.

The indented "NOTE" in "3.3 Node-sets"[URI2] is helpful as it contains an
example that highlights the difference between the following two
expressions:

preceding::foo[1]

(preceding::foo)[1]

The former selects the first node in *reverse document order*. The latter
selects the first node in *document order*. This is determined by the axis
with respect to which the predicate is evaluated, i.e. whether it is a
forward or reverse axis.

While predicates are defined semantically in one way (and in one
place[URI1]), they appear syntactically (i.e. in the XPath grammar
productions) in two places: 1) as part of the Step production[URI3], and 2)
as part of the FilterExpr production[URI4].

The predicate in the first example above is part of the Step itself, which
is to say that it is tightly bound to the preceding::foo step. Therefore,
the axis used to evaluate the positional predicate is a reverse axis,
because the preceding axis is a reverse axis. Consequently, the predicate
filters out all nodes but the first node in *reverse document order*.

The predicate in the second example is not part of the Step, but is part of
a more general FilterExpr. In XPath 1.0, a predicate may follow any kind of
expression; in this case, it follows a parenthesized expression. The
parentheses render preceding::foo an expression in its own right (without a
predicate), yielding an opaque node-set. A node-set never retains
information about what axis was used to select it. That node-set result is
subsequently filtered with a predicate. When a predicate applies to an
expression, the XPath spec says, it is evaluated with respect to the child
axis. (The child axis is arbitrarily chosen because it's an example of a
forward axis.) Consequently, the predicate filters out all nodes but the
first node in *document order*.

The Step production[URI3] is as follows:

[4]    Step    ::=    AxisSpecifier NodeTest Predicate*
                      | AbbreviatedStep

Note the "Predicate*" part above. This denotes that multiple predicates can
be part of a single location step. Understanding this will dispel any
confusion about how an expression like the following is evaluated:

preceding::foo[@bar][1]

"[1]" selects the first node in *reverse document order*, because the
positional predicate, even though it's not the first predicate, is still
tightly bound to, i.e. a part of, the location step.

The FilterExpr production[URI4] is as follows:

[20]    FilterExpr    ::=    PrimaryExpr
                             | FilterExpr Predicate

An example instance of this production can be seen by slightly modifying the
last example:

(preceding::foo[@bar])[1]

Here, "[1]" selects the first node in document order.

Evan Lenz

[URI1] http://www.w3.org/TR/xpath#predicates
[URI2] http://www.w3.org/TR/xpath#node-sets
[URI3] http://www.w3.org/TR/xpath#NT-Step
[URI4] http://www.w3.org/TR/xpath#NT-FilterExpr


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread