Re: [xsl] Does the count() function require access to the whole subtree?

Subject: Re: [xsl] Does the count() function require access to the whole subtree?
From: Dimitre Novatchev <dnovatchev@xxxxxxxxx>
Date: Thu, 16 Jan 2014 08:57:28 -0800
Imagine you have a nice transformation that uses the "Multiple-Pass
Transformation" design pattern.

This transformation suddenly crashes, because the source XML document
that is typically provided has grown / become huge. Streaming just
Pass1 doesn't help here, because the result of Pass1 (or of any other
intermediary pass) is still too-huge.

Even though we have modified all passes to work in streaming mode, we
need to save the result of each pass to disk and then to start a
separate, new transformation -- for each pass. This is both
inconvenient and rather inefficient.

The solution is to have "streaming output"  -- in addition to being
able to stream input.

I believe that this wouldn't be difficult to specify and for
implementors to implement. Something like this:

   <xsl:variable name="vPass1" streamed-content="yes">
     <xsl:apply-templates mode="streamed-pass1"/>
   </xsl:variable>
   <xsl:variable name="vPass2" streamed-content="yes">
     <xsl:apply-templates select="$vPass1" mode="streamed-pass2"/>
   </xsl:variable>

.   .    .    .    .    .     .

This would allow Pass2 to begin as soon as Pass1 starts to produce
output -- not waiting for the complete output of Pass1 to be produced,
not then waiting for this output to be written to disk and not having
then to read the output of Pass1 from disk.


This is what *useful* streaming could really be.



On Thu, Jan 16, 2014 at 8:20 AM, Michael Kay <mike@xxxxxxxxxxxx> wrote:
>>
>> For example, your pipeline could collect all //x (streaming) and then
>> reverse them (not streaming).
>>
>> In principle, you would need only the memory for holding //x (or
>> rather, copies of //x or pointers to them), not the entire collection
>> within which they are found.
>
> Generally you can't keep a "pointer" to a streamed node, because it's
transient. But you can keep a copy.
>
> So you should be able to do
>
> reverse(/copy-of(/x))
>
> The result of copy-of(//x) is grounded (it doesn't contain any streamed
nodes), and that makes it amenable to operations such as reverse().
>>
>>
>> I also had a related question, back in September. It wasn't answered
>> (rare for this list), either because it wasn't clear, or because XML
>> Summer School was going on at the time. Or both.
>
> Or possibly because the streaming design was completely up in the air at
that moment because we had just found a major bug. But I think we've still got
some issues with multi-phase streaming that need to be sorted out (that is,
with reading from a stream of nodes that is itself constructed by the
stylesheet), and such use cases need further work: we've got an agenda item to
discuss this.
>>
>> http://xsl-list.markmail.org/thread/pwuzpcvdoi7eam4h
>>
> Michael Kay
> Saxonica
>



--
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
To achieve the impossible dream, try going to sleep.
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they
write all patents, too? :)
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.

Current Thread