Re: [xsl] Why does my streaming program hang when the input is a streaming web site ?

Subject: Re: [xsl] Why does my streaming program hang when the input is a streaming web site ?
From: "Abel Braaksma (Exselt) abel@xxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 10 Jun 2014 09:55:32 -0000
On 7-6-2014 15:50, Costello, Roger L. costello@xxxxxxxxx wrote:
> This web site emits a continuous stream of XML:
>
> http://xmpp.wordpress.com:8008/firehose.xml?type=text/plain
>
> <snip />
>
> java saxon9ee.jar -o:Titles.html test.xml Show-Titles.xsl
>
> Any thoughts on why the command line hangs and I get no output?

You touch on a very intericate and subtle point with regards to
streaming: how and when output streaming is allowed or can be achieved.

AFAICT Saxon buffers output, which is allowed, even enforced by the
streaming definition in the current XSL Working Draft. I think that if
you run it long enough, at some point the buffer will get full and you
will see output, albeit a non-complete one, and potentially in a
temporary file (not sure about internals of Saxon here).

It reminds me of a discussion at XML London I had with Charles Foster,
and a question that came up at the end of my talk (not sure who to
credit for the question). It was about what happens when output is large
(needs streaming), but input is small. In your case both output and
input are large (I call your input "intrinsically streaming", even if it
isn't large, it must be processed using streaming because the stream is
never-ending and you want intermittent output), but the same question
applies. The answer was: the XSL WD is not prescriptive here, but it
does require to run in constant memory, which at some point requires
buffering and intermittent flushing.

But this is in conflict with another rule. A processor is not allowed to
create a non-valid principal output document (it is allowed to do this
with result-document though, as in the case of failure or interruption).
So either it must write all (when successfully processed whole stream)
or nothing (in case of error/interruption). To prevent this from
happening, a processor must buffer until it knows it will successfully
finish. So even if it flushes intermittently, there must be a mechanism
that does a rollback in case of failure. In other words: interrupt your
processor, it signals an error, and your output will be lost (go back to
start do not pass go, do not collect $200).

Your stylesheet might work differently and more to your expectations if
you switch to using result-document on, say, each atom:entry. That way
the processor can process a complete node and write a complete document,
and it is more likely to flush it to disk each time a node goes out of
focus.

Another option is to output something that is not required to be
validated or well-formed (i.e., text), but I'm not sure if it will
change the behavior.

And yet another option is to customize Saxon to use a different XML
Writer, one that you control. But I'm afraid my knowledge of Saxon and
streaming is not deep enough to give you an example of that, or even
whether it can solve the buffering problem.

Cheers,

Abel Braaksma
Exselt XSLT 3.0 streaming processor
http://exselt.net

Current Thread