Re: [xsl] Does the count() function require access to the whole subtree?

Subject: Re: [xsl] Does the count() function require access to the whole subtree?
From: David Carlisle <davidc@xxxxxxxxx>
Date: Wed, 15 Jan 2014 11:16:58 +0000
On 15/01/2014 10:48, Costello, Roger L. wrote:
A couple days ago Michael Kay wrote:

count(//x) is streamable, but data(//x) is not. Here //x is a
"crawling" expression - one that selects nodes which may overlap
each other. When an expression returns (potentially) overlapping
nodes, the W3C spec says you can apply inspection operations like
count() to those nodes, but you cannot apply absorption expressions
like data(), because doing so would require buffering.

I'm getting hung up on terminology: crawling, inspection,
absorption, overlap.

Even though (I think) that I now understand those terms, I still
don't understand why one expression is streamable while another is
not. For example, why is count(//x) streamable whereas data(//x) is
not?  I don't want to remember a bunch of definitions and rules
(which I will quickly forget). I want to understand the concepts
(which I won't forget).

I understand best with examples. Consider this XML:

<Document> <x> <x>A</x> B </x> </Document>

This XPath expression:

//x

returns a sequence of two subtrees:

<x> <x>A</x> B </x>

<x>A</x>

Michael, would you take us  (conceptually) through the process an
XSLT processor would go through - as it incrementally steps through
the XML document - to count the number of <x> elements?

And would you explain why the data(//x) operation would require
buffering and is therefore not streamable, please?

/Roger




--


<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>


<xsl:template match="/"> :<xsl:sequence select="count(//x)"/> :<xsl:sequence select="data(//x)"/> :<xsl:sequence select="//x/data(.)"/> </xsl:template> </xsl:stylesheet>


In case Michael is distracted by the fact that he actually knows what the system is doing, I'll attempt to give a top level view (just using a xslt 2 stylesheet for demonstration)

count() just returns a single value and the system could conceptually
go through the document once counting every time it sees x.

data(//x) is the same as //x/data(.) and returns the data of each
element in the sequence.

Now the first element in the sequence is the outer x, to work out its
data(0 the system has to process the _full_ content of that element
(which actually is the whole document). Then the next element of the
sequence is the inner x, oops we passed that already, so you need to
back up to get that. This is why the concern about "overlapping trees"
comes from.

David

________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. ________________________________________________________________________


Current Thread