Re: XSLT V 1.1

Subject: Re: XSLT V 1.1
From: Jeni Tennison <mail@xxxxxxxxxxxxxxxx>
Date: Sat, 16 Sep 2000 20:25:03 +0100
Paul,

>----- Original Message ----- 
>From: Jeni Tennison <mail@xxxxxxxxxxxxxxxx>
>
>> This solution wouldn't handle situations where there are multiple documents
>> being accessed through the same document() call:
>> 
>> <class-files>
>>   <file href="class1.xml" />
>>   <file href="class2.xml" />
>>   <file href="class3.xml" />
>> </class-files>
>> 
>> <xsl:for-each select="document(class-files/file/@href)/classes/class">
>>   <xsl:sort select="@name" />
>>   ...
>> </xsl:for-each>
>
>What is this?  This should not work in current XSLT.

Sorry, Paul, both should and does.  When the first argument of document()
is a node set, then it is treated as if it were multiple calls to node set,
one on each of the string values of the nodes, and then unioned together.
So the above document() call translates to:

  document('class1.xml') | document('class2.xml') | document('class3.xml')

>From the Rec (Section 12.1):

"When the document function has two arguments and the first argument is a
node-set, then the result is the union, for each node in the argument
node-set, of the result of calling the document function with the first
argument being the string-value of the node, and with the second argument
being the second argument passed to the document function."

>Could you please write some accurate usecase?  And 
>then I'l try to show how to do it without document() 'magic' .

This is a bit of functionality that I have needed in the past to achieve
the things I needed to achieve, in particular to create extensible
solutions where I couldn't predict in advance how many files the stylesheet
would need to access.  I don't know how more 'accurate' a use case can be
than being a case that has been used.

>> If document() accepted only single strings (or nodes), sorting a collection
>> of classes drawn from several files would, I think (?), only be possible by
>> going through an intermediary result tree fragment.
>
>And what's wrong with usage of intermediate variable ? 

I wasn't saying there was anything 'wrong' with using an intermediate
variable.  However, there are four issues involved in using one:

Firstly, you can't do it in XSLT 1.0 without using extension functions and
thus reducing the portability of your code.  I realise that that's probably
not an issue in this thread as you are specifically talking about XSLT 1.1,
but it is an indication of why document() *with* this functionality may
have been included in XSLT 1.0.  Possibly if variables had generated node
sets in the first place, then the powers that be would not have defined it
in this way.

Secondly, it's more verbose.  The alternative code (given implicit
rtf->node-set conversion) would be:

<xsl:variable name="docs">
  <xsl:for-each select="class-files/file">
    <xsl:copy-of select="document(@href)" />
  </xsl:for-each>
</xsl:variable>
<xsl:for-each select="$docs/classes/class">
  <xsl:sort select="@name" />
  ...
</xsl:for-each>

Of course the more verbose code may be regarded as a Good Thing.  There's
always a balance to be drawn between readability and the size (and hence
storage space and parse/processing time) of the stylesheet; different
projects will have different priorities.

Thirdly, there are the issues of the base URI to be used for retrieving
further information about the class files.  Let's say for the sake of
argument that the class files themselves are in different directories and
each have further references out - perhaps they point to a module
definition - and that those references are relative to the class files
themselves.

<module href="modules/module1.xml" />

In the initial code, the class nodes that are iterated over are within the
initial document itself, and it is therefore possible to identify the base
URI for resolving these references.  In the above variable declaration, on
the other hand, a new RTF is generated - it's not *pointing to* the nodes,
it's making a new copy of them.  It is therefore harder (if not impossible,
depending on the XML schema) to tell what base URI should be used to
resolve these references.

Finally, creating a copy of each of the documents involved means that, in a
naive implementation at least, not only are the documents themselves
stored, but so is a copy of each of them, which would presumably have an
adverse effect on the memory consumption of the XSLT processor.

>If good old xsl:for-each is so 'bad' for aggregation that it should 
>live *inside* document()  why not place xsl:sort into document() ? 
>Just kidding. I want to take for-each *out* of document(). Not 
>to place *more* 'handy things' into document(). 
[and snip explanation for bias against 'handiness']

I can quite see your point and can imagine how frustrating it must be to be
presented with impenetrable code day after day.  As usual, the goal has to
be to 'make the simple things easy and the complex things possible', for
both the stylesheet author and the stylesheet maintainer.

Within XSLT 1.0 there is no way to convert a result tree fragment into a
node set.  This means that certain 'complex things' (and even some 'simple
things') would be impossible if it weren't for 'handy' functions and
operators.  document() is not the only place where an implicit for-each
takes place.  For example, the '=' operator performs an implicit for-each
whenever a node set is used on either side.  For example:

  <xsl:for-each select="element[not(. = preceding-sibling::element)]">
    <xsl:value-of select="position()" />. <xsl:value-of select="@name" />
  </xsl:for-each>

gives a numbered list of the names of those elements that do not have a
preceding element with the same content.  To do this without the implicit
for-each behaviour and without rtf to node-set conversion would be (I
think) impossible (the numbering's the tricky bit).  The implicit for-each
with '=' was presumably designed to make this kind of thing possible.

With implicit rtf to node-set conversion it would be possible even without
the implicit for-each:

  <xsl:variable name="unique-elements">
    <xsl:for-each select="element">
      <xsl:if test="not(preceding-sibling::element(. = current()))">
        <xsl:copy-of select="." />
      </xsl:if>
    </xsl:for-each>
  </xsl:variable>
  <xsl:for-each select="$unique-elements">
    <xsl:value-of select="position()" />. <xsl:value-of select="@name" />
  </xsl:for-each>

However, I imagine that this would be a lot less efficient as well as being
more verbose.

In fact there have been a couple of questions here in short order saying
"how can I select unique nodes *case-insensitively*?"  By analogy with the
above, the solution is:

  element[not(translate(., $upper, $lower) = 
              translate(preceding-sibling::element, $upper, $lower))]

However, this does not work because the translate() function has no
implicit for-each: it converts the node set to a string by taking the
string value of the first node, and operates only on that.  I hesitate to
suggest it for fear of raising your ire yet more, Paul, but perhaps there's
an argument for having these string functions perform with implicit
for-eaches to permit the above.

Another use case (which kind of brings me full circle) would be where I
have a collection of nodes that identify, say, data sets:

<dataset>
  <data number="1" />
  <data number="2" />
  <data number="3" />
</dataset>

and I want to access all of the documents that are of the form 'dataN.xml'
where N is the number as indicated in the XML above.  This isn't possible
(as far as I can tell) in XSLT 1.0 (but will be within XSLT1.1).  I would
dearly love to be able to do:

  document(concat('data', dataset/data/@number, '.xml'))

to be able to retrieve them all at once.  But perhaps this just marks me
out as a 'good perl hacker' despite my relative ignorance of Perl ;)

As David Carlisle has pointed out, the above functionality will be made
possible when(/if?) it becomes possible to define XSLT functions for use in
XPath expressions.  Then I could do something like:

  my:document(dataset/data/@number)

with:

<xsl:function name="my:document">
  <xsl:param name="numbers" />
  <xsl:variable name="first-doc"
                select="document(concat('data', $numbers[1], '.xml'))" />
  <xsl:choose>
    <xsl:when test="count($numbers) > 1">
      <xsl:return select="$first-doc |
                          my:document($numbers[position() > 1])" />
    </xsl:when>
    <xsl:otherwise>
      <xsl:return select="$first-doc" />
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

to perform both those implicit for-eaches explicitly (and actually
recursively) for me.  I'm not sure whether user-defined functions like this
are more or less transparent to the person who has to maintain the code?

All in all, I think the important thing as we move on to the next stage in
XSLT evolution is that those design patterns that we find ourselves using
time and time again (like selecting unique nodes) should be made easier
through the introduction of functions (and XSLT elements) that *both*
decrease the verbosity of the code *and* enhance its readability.
Introducing user-defined functions will help this a great deal.  For
example, instead of the above XPath to select unique nodes, why not
something like:

  elements[my:unique(., ../elements)]

Allowing authors to create and share their own functions, and to use them
in the way they want to use them, will quickly identify those that are
useful and those that are not, how many arguments they should take, what
type they should be and how they should be used.

Cheers,

Jeni

Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread