Re: [xsl] Why does my streaming program hang when the input is a streaming web site ?

Subject: Re: [xsl] Why does my streaming program hang when the input is a streaming web site ?
From: "Michael Sokolov msokolov@xxxxxxxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 10 Jun 2014 18:55:39 -0000
On 6/10/2014 2:44 PM, Abel Braaksma (Exselt) abel@xxxxxxxxxx wrote:
On 10-6-2014 16:11, Michael Sokolov msokolov@xxxxxxxxxxxxxxxxxxxxx wrote:
On 6/10/2014 5:55 AM, Abel Braaksma (Exselt) abel@xxxxxxxxxx wrote:
But this is in conflict with another rule. A processor is not allowed to
create a non-valid principal output document (it is allowed to do this
with result-document though, as in the case of failure or interruption).
So either it must write all (when successfully processed whole stream)
or nothing (in case of error/interruption).
I wasn't aware of this rule, but strictly speaking it's impossible to
achieve given normal file I/O behavior, even when the entire document
is buffered in memory before writing.
The rule does not say anything about what the influence on the
environment is. It states that, during processing, a roll-back must take
place. If the roll-back itself fails, that's a whole different kind of
error. If writing itself fails (disk full, no access etc), that too is a
different kind of error.

More specifically, the rules in the current XSLT 3.0 Last Call WD
Section 8.3 (http://www.w3.org/TR/xslt-30/#try-catch) are (items #3, #4
and #5):

"3. Some fatal errors arising in the processing environment, such as
running out of memory, may cause termination of the transformation
despite the presence of an xsl:try instruction. This is
implementation-dependent."

"4. If the sequence constructor or select expression of the xsl:try
causes execution of xsl:result-document, xsl:message, or xsl:assert
instructions and fails with a dynamic error that is caught, it is
implementation-dependent whether these instructions have any externally
visible effect. The processor is not required to roll back any changes
made by these instructions. The same applies to any side effects caused
by extension functions or extension instructions."

"5. A serialization error that occurs during the serialization of a
final result tree produced using xsl:result-document is treated as a
dynamic error in the evaluation of the xsl:result-document instruction,
and may be caught by a containing xsl:try instruction. A serialization
error that occurs while serializing the implicit final result tree
returned by the initial template is treated as occurring after the
transformation has finished, and cannot be caught."

And the Note below those rules:

"Note: If an error occurs while evaluating an instruction within
xsl:try, then no instruction within the xsl:try has any effect on the
result returned by the xsl:try instruction. This means that if a
processor is streaming the output to a serializer, it needs to adopt a
strategy such as buffering the output in memory so that nothing is
written until successful completion of the xsl:try instruction, or
checkpointing the output so it can be rolled back when an error occurs."

In XSLT 2.0 it was not possible (at least not without extensions) to
catch errors, so either the whole process failed, or it succeeded. In
XSLT 3.0, there is a significant difference between the initial implicit
final result tree and one for xsl:result-document as shown above. As you
can see, I/O errors and such can be considered "fatal errors", that are
outside the scope of the specification, though an im

Thanks for clearing that up for us.
plementation may
make certain environmental errors catchable.


Writes happen sequentially, so there will certainly be incomplete
intermediate file states which may be readable by other processes,
depending on operating system rules. It seems to me that as a
practical matter it doesn't make sense to spend a lot of time worrying
about it, though, and it's not really a good justification for
buffering on its own.
I agree that in most situations, there is not need for a stylesheet
writer to worry about buffering or not. An exception to that rule is in
the scenario by Roger, where the input stream is never-ending.

I hope that streaming processors will feel free to write intermediate
results as needed; perhaps a <xsl:flush /> instruction wouldn't be
amiss, either :)
I doubt such an instruction will come, but you can always file a request
as a bug-entry against the W3C spec if you think it serves a strong
use-case. To me, it looks more like a counterpart to
saxon:discard-document and, as being part of the infrastructure, should
be in the realm of extension instructions. Or, as we are currently
considering this use-case for Exselt, as a commandline option.

That sounds reasonable to me, although if the buffer size is small enough, it might not be necessary.

-Mike

Current Thread