Subject: [xsl] Content constructors and sequences From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx> Date: Wed, 9 Jan 2002 08:55:24 +0000 |
Hi, I'd greatly appreciate comments on the following; I'll post to xsl-editors@xxxxxx and www-xpath-comments@xxxxxx if the comments here don't point out a glaring flaw. Please post if you think it's a good idea, as well as if you think it's a bad one, particularly if you can think of ways of improving the strength of the argument. Thanks, Jeni --- Executive summary ----------------- Rather than XPath being continuously extended to allow it to do what XSLT can already do, XSLT should be modified to support the thing that it can't already do: sequence construction. This could be achieved by amending the definition of content constructors in XSLT 2.0 and introducing a new xsl:item instruction. This change would make XSLT more consistent and more usable. Contents -------- 1. Requirement 2. Sequence constructors 3. Producing simple typed values and existing nodes 4. Impact on XPath 5. Impact on function definitions 6. Impact on variable bindings 7. Allowing rootless nodes 8. Impact on result tree generation 9. Conclusions 10. References Requirement ----------- Yesterday, David C. posted a message to www-xpath-comments@xxxxxx that described how XPath is restricted by the lack of a general variable-binding expression (let clause) [1]. I think that the lack of a let clause restricts what's practical in XPath (even if it doesn't affect what's theoretically possible). For example, with the for expression, you have to reconstruct any sequence that you create within the for expression each time you use it, which probably isn't particularly efficient and leads to maintenance headaches. For example: for $o in $orders return if (count($o/item[(@price * @quantity) > 100]) > 5) then do:something($o/item[(@price * @quantity) > 100]) else do:something-else($o/item[(@price * @quantity) > 100]) The way around this is with functions, because then you can use xsl:variable to assign the variable: for $o in $orders return do:process-items($o) and: <xsl:function name="do:process-items"> <xsl:param name="order" /> <xsl:variable name="items" select="$order/item[(@price * @quantity) > 100]" /> <xsl:result select="if (count($items) > 5) then do:something($items) else do:something-else($items)" /> </xsl:function> but it's hardly ideal. The same kind of problem occurs within an if expression within a for expression, when certain variables are relevant within one branch of the if and not in the other. For example: if ($string and $keyword) then if ((starts-with($string, $keyword) or ends-with(substring-before($string, $keyword), ' ')) and (not(substring-after($string, $keyword)) or starts-with(substring-after($string, $keyword), ' '))) then (substring-before($string, $keyword), $keyword, substring-after($string, $keyword)) else $string else () which could be managed with: if ($string and $keyword) then (for $before in substring-before($string, $keyword), $after in substring-after($string, $keyword) return if ((not($before) or ends-with($before, ' ')) and (not($after) or starts-with($after, ' '))) then ($before, $keyword, $after) else $string else () but which would be much clearer (and more accurate, since you're not really iterating) as: if ($string and $keyword) then (let $before := substring-before($string, $keyword), $after := substring-after($string, $keyword) if ((not($before) or ends-with($before, ' ')) and (not($after) or starts-with($after, ' '))) then ($before, $keyword, $after) else $string else () Again, you could create a function to do the testing, but if we have to generate new functions every time we want to bind variables, we're going to have them coming out of our ears. It's certainly true that you could add a let clause to XPath; you could also add a where clause... and a sortby clause... and typeswitches... and even element constructors... but what you end up with is a replication of all the facilities of XSLT, but using a non-XML syntax, and stuffed inside XML attributes. Sequence constructors -------------------- So I'd like to suggest an alternative. Instead of modifying XPath so that it can do all the things that XSLT can do plus construct sequences, why not modify XSLT so that it can construct general sequences rather than just node sequences? Doing this is (I *think*) simpler than it sounds. In XSLT 2.0, "content constructors" are defined as [2]: "a sequence of nodes in the stylesheet that, when evaluated, constructs and returns a sequence of new nodes suitable for adding to the result tree. This sequence is referred to below as the result sequence." If we modify that definition, so that "content constructors" don't necessarily return *nodes* (they should probably then be called "sequence constructors"): a sequence of nodes in the stylesheet that, when evaluated, constructs and returns a sequence. This sequence is referred to below as the result sequence. We can amend the description of XSLT instructions in line with this: XSLT instructions then produce a sequence of zero, one, or more items as their result. These items are added to the result sequence. Some instructions, such as xsl:element, return a newly-constructed node (which may have its own attributes, namespaces, children, and other descendants); others, such as xsl:if, return items produced by their own nested sequence constructors. [There are a couple of incompatibility problems here that I think can be handled; I'll come on to those later.] Producing simple typed values and existing nodes ------------------------------------------------ All we need now is an element that can add a simple typed value or an existing node to the result sequence. This could be achieved with an xsl:item element: <!-- Category: instruction --> <xsl:item select = expression type = datatype> <!-- Content: sequence-constructor --> </xsl:item> The xsl:item element works similarly to variable-binding elements: it produces a sequence of items from either its select attribute or its content. This enables you to add simple typed values or existing nodes to a sequence. For example, the equivalent to the for expression that we looked at earlier would be: <xsl:variable name="new-orders" type="item*"> <xsl:for-each select="$orders"> <xsl:variable name="items" select="item[(@price * @quantity) > 100]" /> <xsl:item select="if (count($items) > 5) then do:something($items) else do:something-else($items)" /> </xsl:for-each> </xsl:variable> The $new-orders variable would have a value of a sequence of items. Impact on XPath --------------- Enabling XSLT to generate sequences will remove the requirement for XPath to support expressions that involve range variables. For example: <xsl:variable name="join" type="xs:integer*" select="for $i in (1, 2), $j in (3, 4) return ($i, $j)" /> could be done with: <xsl:variable name="join" type="xs:integer*"> <xsl:for-each select="(1, 2)"> <xsl:variable name="i" select="." /> <xsl:for-each select="(3, 4)"> <xsl:variable name="j" select="." /> <xsl:item select="($i, $j)" /> </xsl:for-each> </xsl:for-each> </xsl:variable> [Of course a mapping operator would still be useful for simple cases.] It would also remove the requirement for the sort() function (from XSLT, and indeed named sort specifications altogether) or the adoption of the sortby clause from XQuery, since the existing xsl:sort can be used. For example, instead of: <xsl:sort-key name="subtotal-sort"> <xsl:sort select="@price * @quantity" data-type="number" order="descending" /> <xsl:sort select="@part-id" order="ascending" /> </xsl:sort-key> <xsl:variable name="sorted-items" select="sort($items, 'subtotal-sort')" /> you could do: <xsl:variable name="sorted-items"> <xsl:for-each select="$items"> <xsl:sort select="@price * @quantity" data-type="number" order="descending" /> <xsl:sort select="@part-id" order="ascending" /> <xsl:item select="." /> </xsl:for-each> </xsl:variable> Impact on function definitions ------------------------------ Adding the xsl:item element allows us to get rid of the xsl:result element when defining functions. The xsl:function element's new syntax would be: <xsl:function name = qname> <!-- Content: (xsl:param*, sequence-constructor) --> </xsl:function> The xsl:function element would simply return the sequence produced by its content constructor. For example: <xsl:function name="my:split-string"> <xsl:param name="string" type="xs:string" /> <xsl:param name="keyword" type="xs:string" /> <xsl:if test="$string and $keyword"> <xsl:variable name="before" select="substring-before($string, $keyword)" /> <xsl:variable name="after" select="substring-after($string, $keyword)" /> <xsl:item select="if (not($before) or ends-with($before, ' ')) and (not($after) or starts-with($after, ' ')) then ($before, $keyword, $after) else $string" /> </xsl:if> </xsl:result> Impact on variable bindings --------------------------- The current XSLT 2.0 WD states: "[ERR030] Elements such as xsl:variable, xsl:param, xsl:message, and xsl:result-document construct a new document node, and use the result sequence returned by the content constructor to form the children of this document node. In this case it is an dynamic error if the result sequence contains namespace or attribute nodes. The processor must either signal the error, or must recover by ignoring the offending nodes. The elements, comments, processing instructions, and text nodes in the node sequence form the children of the newly constructed document node." I'll concentrate on variable-binding elements here (xsl:message and xsl:result-document are handled in the next section). Supporting the creation of sequences means that rather than create a new document node, variable-binding elements must bind the variable to the result sequence produced by their sequence constructor. This sequence must be able to contain all kinds of nodes. There is a backwards incompatibility here - if a variable is assigned a value through the content of the variable-binding element, then rather than conceptually holding the "root node of the result tree fragment" as in XSLT 1.0, the variable holds a sequence of items (nodes, assuming you're using the variable as in XSLT 1.0). Currently, when users get the string value of a result tree fragment, they get the string value of the *root node* of the result tree fragment - the concatenation of the string values of the text node descendants in the result tree fragment. On the other hand, when users get the string value of a sequence, they get the string value of the first item in the sequence. Therefore if you have: <xsl:variable name="foo"> <element>A</element> <element>B</element> </xsl:variable> then string($foo) will give "AB" in XSLT 1.0 and just "A" in XSLT 2.0 (if sequence constructors were supported). [I don't think that people get the string values of result tree fragments that contain elements very often because it's rarely useful to create a result tree fragment with internal structure and then proceed to ignore that internal structure, but it does happen.] Another difference applies if people are used to using node-set() extension functions to convert variables to node sets. As there is no document node, addressing the items in the sequence does not involve stepping down to them. For example, given the above definition of $foo, the equivalent of the following in XSLT 1.0: <xsl:for-each select="exsl:node-set($foo)/element"> ... </xsl:for-each> is simply: <xsl:for-each select="$foo"> ... </xsl:for-each> [There's an argument that XSLT 2.0 shouldn't have to worry about backwards compatibility with extension functions, but the node-set() extension function is very widely used and is based on the description of result tree fragments from XSLT 1.0.] These backwards compatibility issues could be resolved by having the type attribute on the variable-binding element determine the behaviour of the variable-binding element. If the type attribute is not present, then the variable-binding element creates a result tree (as described later), and the variable is bound to a new document node; if the type attribute is specified, then the variable is bound to the sequence. [This is similar to the role played by the separator attribute on xsl:value-of.] Allowing rootless nodes ----------------------- Section 3.1 of the XSLT 2.0 WD [3] states: "The data model defined in [Data Model] allows a node to be part of a tree whose root is a node other than a document node. "Although such nodes may exist transiently during the course of XSLT processing, every node that is processed by an XSLT stylesheet (that is, a node that may be returned in the result of an expression) will belong to a tree whose root is a document node." This will no longer be true. It will be possible to create sequences containing nodes that do not have a parent. I'm not certain why this restriction applies in XSLT, especially as it is not a restriction in the data model or in XQuery. There might be something here that causes problems for the whole sequence-generation-using-content-constructors idea, but I'm not sure what it would be. If the suggestion for retaining backwards compatibility with variable-binding elements is used, then if XSLT 2.0 is used like XSLT 1.0 (i.e. without type attributes on variable-binding elements, and without user-defined functions) it is still true that every node that may be returned in the result of an expression will belong to a tree whose root is a document node. Impact on result tree generation -------------------------------- The final impact of this change is on result tree generation. This applies to the construction of the content of element nodes, principal result tree, secondary result trees, messages, and tree variables (those without a type attribute). It also applies, slightly differently, to the construction of comment, attribute, processing instruction, text and namespace nodes (which I'll call simple nodes so that I don't have to repeat their names constantly). Currently, content constructors construct a sequence of nodes, and this sequence of nodes can be made into a result tree by adding a parent node, or converted to a string to be used as the value of a simple node. Under certain circumstances, the presence of certain types of nodes in the node sequence is a recoverable dynamic error (e.g. attribute nodes when creating a document; element nodes when getting the string value for an attribute). If we had the more general sequence constructors, result trees would need to be constructed from sequences containing any mixture of simple typed values and nodes (both newly created (rootless) and pre-existing (rooted)), rather than those containing just newly created nodes. Pre-existing nodes can be differentiated from newly created nodes by the fact that they already have a parent, are already part of a tree, and are therefore not rootless. With pre-existing nodes, there are three options: - the pre-existing node is (deep) copied, and replaced in the sequence by the newly created copy (often inappropriate when the sequence provides a value for a simple node) - the pre-existing nodes is ignored - the presence of a pre-existing node in a sequence that's used to generate a result tree is a dynamic error, with one of the two above options as a recovery action Similarly, there are three options for simple typed values: - the string value of the simple typed value is used as the value for a newly created text node, and replaced in the sequence by this newly created text node (which would have to be concatenated with surrounding text nodes) - the simple typed value is ignored - the presence of a simple typed value in a sequence that's used to generate a result tree is a dynamic error, with one of the two above options as a recovery action In both cases I think that it's reasonable to make it an error, with the creation of a node as a recovery action. Conceptually, the sequence could be treated in exactly the same way as currently after pre-existing nodes and simple typed values are substituted. Conclusions ----------- If XPath were extended to be a usable method of generating sequences, it would end up replicating the variable assignment and flow control features that are already available within XSLT. While there is an argument for constructing a language that performs transformations without using XML syntax, that niche is already filled by XQuery. In addition, because XPaths are used within attributes in XSLT, XSLT with extended XPath will become a lot harder to read, write, and maintain than the equivalent XSLT instructions. Extending the concept of 'content constructors' to more general 'sequence constructors' and introducing an xsl:item element to add simple typed values and pre-existing nodes to this sequence gives XSLT the power to construct sequences of all descriptions. Rather than learning one language for constructing sequences of nodes and a different language with similar constructs for constructing other sequences, you will only have to learn one, unified, language. References ---------- [1] http://lists.w3.org/Archives/Public/www-xpath-comments/2002JanMar/0026.html [2] http://www.w3.org/TR/xslt20/#dt-content-constructor [3] http://www.w3.org/TR/xslt20/#rootless-nodes --- Jeni Tennison http://www.jenitennison.com/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] passing parameters to a X, Andrew Welch | Thread | RE: [xsl] Content constructors and , Michael Kay |
[xsl] passing parameters to a XSLT , rafael vazquez | Date | RE: [xsl] Internet Explorer for Ma, Mark Seaborne |
Month |