Re: [xsl] Things that make you go Hmmmm!

Subject: Re: [xsl] Things that make you go Hmmmm!
From: "Abel Braaksma (Exselt)" <abel@xxxxxxxxxx>
Date: Sat, 29 Mar 2014 15:05:33 +0100
On 29-3-2014 13:42, Ihe Onwuka wrote:
> On Sat, Mar 29, 2014 at 11:57 AM, Andrew Welch <andrew.j.welch@xxxxxxxxx> wrote:
>> That the content models are still different (not just the select
>> attribute). 
> The only other difference I see is that xsl:copy has a
> use-attribute-set attribute and the xsl:copy-of  doesn't.
> I'd make them the same.

Now I see your point! So let me try to answer this one ;).

You are saying that the attributes on the instructions are different (I
was let astray by the words "content model", sorry about that). In fact,
if we look at the official XSLT 3.0 definitions, these are the official

<!-- Category: instruction -->
  select? = expression
  copy-namespaces? = "yes" | "no"
  inherit-namespaces? = "yes" | "no"
  use-attribute-sets? = eqnames
  type? = eqname
  validation? = "strict" | "lax" | "preserve" | "strip"
  on-empty? = expression >
  <!-- Content: sequence-constructor -->

<!-- Category: instruction -->
  select = expression
  copy-namespaces? = "yes" | "no"
  type? = eqname
  validation? = "strict" | "lax" | "preserve" | "strip" />

We have two instructions here, shallow copy (xsl:copy, which has a
sequence constructor) and deep copy (which does not have a sequence
constructor). The differences are big (not just use-attribute-sets), and
I think that is for a reason:

xsl:copy allows copying of the current node only, without its children,
which allows us to process the children (including for instance
attributes) to be treated differently in subsequent matching templates.
This instruction is at the vital heart of XSLT (as such, it is found in
introduction chapters of many XSLT introduction books, XSLT Cookbook has
very many examples) and is used in the identity template, allowing you
to use fine-grained control over which elements, attributes etc are
changed, renamed, removed.

For instance (see also the
following code uses the identity template and modifies the source
document by removing all <log> elements and their children, leaving the
rest intact.

<xsl:template match="@*|node()">
    <xsl:apply-templates select="@*|node()"/>
<xsl:template match="log" />

xsl:copy-of allows deep-copying of a selected node (I would agree with
Liam here, if this was named xsl:deep-copy, it would have been easier to
memorize, but this name is here since XSLT 1.0, so there is little we
can do about that now). A deep-copy means an immediate copy of the
selected node and all its children, not allowing any further processing.
This is sometimes useful, i.e. in the case where you are absolutely
certain you do not want any further processing of the children of a node.

For instance, if we take the previous example and reverse it, and we say
that we _only_ want the <log> elements of an input document and ignore
the rest, we can do this:

<xsl:template match="@*|node()">
    <xsl:apply-templates select="@*|node()"/>
<xsl:template match="log">
   <xsl:copy-of select="." />

In fact, this can be rewritten as a simple <xsl:copy-of select="//log"
/>, which in most cases will do the same. There is little added benefit
in using a template if all you want to do is a copy-of (deep copy).

But if you want to do _anything_ at all with the children of the <log>
element, for instance, if the log statement is severe, add a red font
style to it, then you are back at the shallow copy version.

Let's look at the (many!) differences between the "content model" of
xsl:copy and xsl:copy-of, and why they are the way they are.

1) the select attribute
On xsl:copy it is optional. This is a no-brainer for backwards
compatibility, as Michael Kay already said, there are very few actual
use-cases for it, but for those that are, like the snapshot-stylesheet,
it is a good nice-to-have. One could argue about orthogonality here,
which is another reason this "bug" in the older specs is fixed in XSLT 3.0.

On xsl:copy-of it is mandatory. At one point I argued in the WG for
making it optional, defaulting to the seqtor, as many other instructions
(again, using the orthogonality argument). The argument against this was
equally good or better: the orthogonality principle would mean that the
sequence constructor ought to be used if the select-attribute is absent.
But if we allowed that, xsl:copy-of with a seqtor would be a no-op. In
other words, it is a useless addition. Too much orthogonality is not
good if it doesn't serve a purpose or use-case.

2) the use-attribute-sets attribute
This one is on xsl:copy, because xsl:copy allows the children to be
modified. The addition of use-attribute-sets serves the use-cases where
you want to replace or augment the attributes on a certain element.

It is not available on xsl:copy-of because the semantics of xsl:copy-of
are different: it makes an unchangeable deep-copy of the selected node.
Allowing use-attribute-sets would change that semantics, which is a bad
idea. If you want to change the children, use xsl:copy, not xsl:copy-of.

3) the on-empty attribute
This is a new attribute in XSLT 3.0 to aid with streaming scenarios.
There is still a lot discussion about this attribute, and there are a
few public bug-reports about it. But the reason it is not on xsl:copy-of
is that it doesn't serve a (streaming) use-case, you should use
fn:has-children() instead (and there are scenario's where that cannot be
solved without @on-empty in xsl:copy).

4) the inherit-namespaces attribute
Available on xsl:copy, not on xsl:copy-of. The same argument as for
use-attribute-sets: the semantics of xsl:copy-of does not allow the
selected node to change, so it has no place in xsl:copy-of. Note that
the default is "yes", if set to "no", it can result to namespace
undeclarations as a result of namespace fixup, I believe it was
introduced for XML 1.1.

5) sequence constructor
The xsl:copy-of instruction is always empty, the xsl:copy has a sequence
constructor (that can be empty). There is no use for a sequence
constructor in xsl:copy-of, because it creates a deep-copy of the
selected nodes. Conversely, for xsl:copy, there must be sequence
constructor available, because the primary use-case is to alter the
children of the (selected) node.

6) Conclusion?
I hope the above helps you understand better what the differences are
between xsl:copy and xsl:copy-of. Despite there perhaps ill-chosen
names, they are very different, serve very different use-cases and have
very different semantics, deep-copy versus shallow-copy.

Note that in XSLT 3.0, we have introduced the new xsl:mode declaration,
which allows programmers to choose what happens with unmatched nodes:
deep-skip, shallow-skip, deep-copy, shallow-copy, text-only-copy, fail.
This reduces the need to write out the identity templates and changes
the default behavior of unmatched nodes. However, it does not replace
xsl:copy or xsl:copy-of, it is just a convenience method for often
occurring scenarios.

>>> As to my point. For the same reasons I would expect the content models
>>> of xsl:next-match, xsl:apply-templates and xsl:apply-imports to be the
>>> same.
>> What would the xsl:sort child of xsl:next-match or xsl:apply-imports do?
> What does xsl:value-of a null string do? Should I prevent it. (You
> must have missed that one yesterday).

In XSLT null strings are not possible. Sorry. An empty sequence is
possible, however. When you try to use xsl:value-of on an empty sequence
it does what you would expect: nothing (it is empty, right?).

> In a language that supports functions as first class values what does
> f = g mean where f and g are functions. Do I prevent it.

f = g is using the equality operator. I do not know why you would want
to compare function items, but function items, just like any other item,
can be compared. You already said "first class values", so I think it
just does what you expect: compare if two functions are the same.

> What does xsl:value-of select="$x" where $x is a function do. Do I prevent it.

Function items are not automatically convertible to strings. That means
that you will get an error if you try to do this. The reason is, I
think, that there is no good way to define how a function item should be
converted (and looking at existing languages that allow this, we see
many different approaches).

> Why are the arguments to min xs:anyAtomicType which means it can be
> applied to a type that does not have a well defined collation sequence
> e.g functions again. Why doesn't the language prevent that.

I am not sure I understand the question, esp w.r.t. "min
xs:anyAtomicType". However, functions are not strings, and cannot be
atomized into strings (at least not automatically). So there is nothing
the language can do here, hence it is prevented.

> What is the advantage  of restricting a language constructs content
> model to the use-cases you can foresee today. What is the disadvantage
> of doing that?

There is no advantage of restricting the "content model" (I assume you
mean the set of instructions, functions, declarations of XSLT?) to
use-cases we can foresee today, but unfortunately, it is the _only_
thing we can do: we come up with a set of use-cases and we create a
language to serve those use-cases, but we cannot come up with use-cases
that we cannot foresee, hence it is impossible to device any language
that serves unforeseen use-cases.

However, as with many language designs, we often find that unforeseen
use-cases can still be served. And sometimes, we explicitly know certain
use-cases, but consider it out of scope for the language because either
the use-case is not general enough (for which purpose extensions and
interop are created, as in almost any language) or backwards
compatibility prevents us from adding it to the language.

As to your earlier question in this thread, about "cheap seats", I think
you meant that it is awkward to wait until XSLT 3.0 becomes a standard
and is implemented into processors, for a relatively trivial new feature
that helps you with certain use-cases. That is indeed unfortunate, but
not a process we can easily speed up. Of course, you can already start
using XSLT 3.0 and some processors, including ours, have implemented
this specific "hmmmm" feature of xsl:copy/select. Though I must admit, I
doubt there are many situations where this feature is really a
requirement, but I am open to real-world sitations that prove me wrong.

Apologies for the lengthy answer in this mail, but I hope my answers
have helped you understand some language features better and/of have
helped you understand how these language features came to fruition.


Abel Braaksma
Exselt XSLT 3.0 processor

Current Thread