Re: [xsl] Does the count() function require access to the whole subtree?

Subject: Re: [xsl] Does the count() function require access to the whole subtree?
From: "Costello, Roger L." <costello@xxxxxxxxx>
Date: Thu, 16 Jan 2014 10:39:24 +0000
In explaining why count(//x) is streamable whereas data(//x) is not, David
Carlisle wrote:

	count() just returns a single value and the
	system could conceptually go through the
	document once counting every time it sees x.

	data(//x) is the same as //x/data(.) and returns
	the data of each element in the sequence. Now
	the first element in the sequence is the outer x.
	To work out its data the system has to process
	the _full_ content of that element (which actually
	is the whole document). Then the next element
	of the sequence is the inner x. Oops, we passed
	that already, so you need to back up to get that.
	This is why the concern about "overlapping trees"
	comes from.

Thanks David, that is an outstanding explanation.

Question: So, what is the general principle at work here?

I'll take a stab at answering that question:

	Let's take as reference this XML:

	<Document>
	    <x>
	        <x>A</x>
	        B
	    </x>
	</Document>

	Consider an XPath expression that yields a
	sequence of <x> elements. Now consider an
	operation on that sequence. Is the operation
	on the sequence streamable or not?

	For example, is the operation count() on the
	sequence generated by the XPath expression
	//x streamable or not? Is the operation data()
	on the sequence generated by //x streamable
	or not?

	We must consider two cases:

	Case 1: 	One or more items in the sequence has
		has an <x> element nested inside
		another <x>.

		If the operation can be performed just
		by inspecting each item of the sequence,
		then the operation on the sequence is
		streamable.

		If the operation requires going inside each
		item of the sequence, then the operation
		on the sequence is not streamable.

	Case 2: 	There are no items in the sequence that have
		an <x> element nested inside another <x>.
		(Each item in the sequence is disjoint)

		The operation on the sequence is streamable,
		regardless of whether the operation just
		inspects each item or goes inside each item.

Is that correct?

Is it complete? Are there any cases that it misses?

Can you express it more simply and more clearly?

/Roger

Current Thread