Re: [xsl] ordered selection of child elements

Subject: Re: [xsl] ordered selection of child elements
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 8 Mar 2018 09:32:05 -0000
When people ask "why?" I'm never sure whether they mean

(a) where in the spec does it say this should happen?, or

(b) why was the spec written this way?

The second question can then be interpreted as either

(b1) as a matter of historical record, when was the decision made and what
arguments were put forward on both sides?, or

(b2) can you think of any reason why a rational WG would have made this

The answer to (a) can be found in B' of the XPath (3.1) specification:


Each operation E1/E2 is evaluated as follows: Expression E1 is evaluated, and
if the result is not a (possibly empty) sequence S of nodes, a type error
<> is raised [err:XPTY0019
<>]. Each node in S then serves in
turn to provide an inner focus (the node as the context item, its position in
S as the context position, the length of S as the context size) for an
evaluation of E2, as described in  2.1.2 Dynamic Context
<>. The sequences resulting from
all the evaluations of E2 are combined as follows:

If every evaluation of E2 returns a (possibly empty) sequence of nodes, these
sequences are combined, and duplicate nodes are eliminated based on node
identity. The resulting node sequence is returned in document order

So the sorting into document order is done by the "/" operator. (Note: it
might be a good idea of getting into the habit of using "!" rather than "/",
especially for trivial expressions like $e/@x, to save the optimizer the
trouble of working out that it doesn't actually need to do a sort in this
particular instance.)

The answer to (b) is roughly as follows.

Firstly, XPath 1.0 actually defines that the expression returns a node-set,
that is, a set of nodes with no defined order. XSLT 1.0 specifies that
constructs like xsl:for-each and xsl:apply-templates process these nodes in
document order. In practice all XPath 1.0 processors that I know of return
node-sets in document order, but there is no requirement in the spec to do so.
I don't know historically why XSLT 1.0 decided on document order (so we're in
(b2) territory here), but interoperability (that is, having all processors
produce identical output) was strong on the WG's requirements list.

Secondly, the question came up again and was hotly debated during the XPath
2.0 deliberations, where I was involved so I can tell you more about it (in
(b1) terms). There was tension here between XQuery developers, who wanted to
give optimizers the maximum freedom to optimize (which in the database world
means using indexes), and XSLT developers, who were (a) more concerned with
interoperability, and (b) more concerned with handling of mixed content (that
is, documents rather than data). The XSL WG was also of course concerned with
backwards compatibility between 1.0 and 2.0.

The rule that path expressions return results in document order is in fact
present in the first published WD of XPath 2.0
( and the
minutes show intense discussion on the topic around the summer of 2001. I
remember a particular posting of mine as being successful in swaying some
XQuery participants: it is dated 23 July 2001 and reads:

I was quite keen on Jonathan Marsh's proposal as a way forward on this.
Looking at the analysis we did on Friday, however, I've come to the
conclusion that for mixed content it's just not viable.

Consider the source:

<warning>Do <emph> not</emph> touch the mains switch, the computer will
<emph> explode</emph></warning>

and the stylesheet fragment:

<xsl:template match="warnings">
<p><xsl:apply-templates select=".//*/text()"/></p>

At XPath 1.0 the output is:

<p>Do not touch the mains switch, the computer will explode</p>

At XPath 2.0, with Jonathan's proposal, the output would be:

<p>Do touch the mains switch, the computer will not explode</p>

I chose a melodramatic example because I thought it would impress Dana [1],
I think the point is clear anyway. With mixed element content, or any
document that has hierarchic structures of variable depth, users naturally
expect path expressions consisting only of "/" and "//" operators to return
results in document order, and if we redefine the semantics in
"breadth-first" terms without reordering, users will get results that are
surprising and disconcerting. As Evan pointed out, it's not just a backwards
compatibility issue, it's a usability issue: document order is the natural
order of the results.

So to move forward, I'm now convinced that we need a separate operator for
sequence-based projection, or a single polymorphic operator whose semantics
are inferred from the data type of the operands. I'm now convinced that
changing the existing semantics of "/" isn't on. (Good try, Jonathan: you
nearly persuaded me!)

Mike Kay


[1] Dana Florescu in previous discussion had used arguments based on safety

> On 8 Mar 2018, at 06:36, Dr. Patrik Stellmann patrik.stellmann@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> Hi,
> a question more motivated by curiosity than by a real problem:
> With
>             <xsl:sequence select=ba, bb/>
> I will get first element a and second element b b no matter of the order
within the input document.
> But with
>             <xsl:sequence select=broot/(a, b)b/>
> I will get the elements a and b in document order. So this behaves identical
>             <xsl:sequence select=broot/(a | b)b/>
> Why?
> Of course I could write
>             <xsl:sequence select=broot/a, root/bb/>
> To ensure a specific order. But sometimes the expression of brootb is
much more complex so Ibd like to avoid writing it twice or putting it in a
> Thanks and regards,
> Patrik
> ------------------------------------------------------------------
> Systemarchitektur & IT-Projekte
> Tel: +49 40 33449-1142
> Fax: +49 40 33449-1400
> E-Mail: Patrik.Stellmann@xxxxxxxxx <mailto:Patrik.Stellmann@xxxxxxxxx>
> <mailteaser_mks_2018.png>
> GDV Dienstleistungs-GmbH
> GlockengieCerwall 1
> D-20095 Hamburg
> <>
> Niederlassungen:
> WilhelmstraCe 43 / 43 G
> 10117 Berlin
> FrankenstraCe 18a
> 20097 Hamburg
> Sitz und Registergericht: Hamburg
> HRB 145291
> USt.-IdNr : DE 205183123
> GeschC$ftsfC<hrer:
> Dr. Jens Bartenwerfer
> Michael Bathke
> Fred di Giuseppe Chiachiarella
> Thomas Fischer
> Aufsichtsratsvorsitzender: Werner Schmidt
> ------------------------------------------------------------------
> Diese E-Mail und alle AnhC$nge enthalten vertrauliche und/oder rechtlich
geschC<tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder
diese E-Mail irrtC<mlich erhalten haben, informieren Sie bitte sofort den
Absender und vernichten Sie diese E-Mail. Das unerlaubte Kopieren sowie die
unbefugte Weitergabe der E-Mail ist nicht gestattet.
> This e-mail and any attached files may contain confidential and/or
privileged information. If you are not the intended recipient (or have
received this e-mail in error) please notify the sender immediately and
destroy this e-mail. Any unauthorised copying, disclosure or distribution of
the material in this e-mail is strictly forbidden.
> XSL-List info and archive <>
> EasyUnsubscribe <-list/293509> (by email <>)

Current Thread