Re: [xsl] Why does my streaming program hang when the input is a streaming web site ?

Subject: Re: [xsl] Why does my streaming program hang when the input is a streaming web site ?
From: "Abel Braaksma (Exselt) abel@xxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 10 Jun 2014 18:44:00 -0000
On 10-6-2014 16:11, Michael Sokolov msokolov@xxxxxxxxxxxxxxxxxxxxx wrote:
> On 6/10/2014 5:55 AM, Abel Braaksma (Exselt) abel@xxxxxxxxxx wrote:
>>
>> But this is in conflict with another rule. A processor is not allowed to
>> create a non-valid principal output document (it is allowed to do this
>> with result-document though, as in the case of failure or interruption).
>> So either it must write all (when successfully processed whole stream)
>> or nothing (in case of error/interruption).
> I wasn't aware of this rule, but strictly speaking it's impossible to
> achieve given normal file I/O behavior, even when the entire document
> is buffered in memory before writing.  

The rule does not say anything about what the influence on the
environment is. It states that, during processing, a roll-back must take
place. If the roll-back itself fails, that's a whole different kind of
error. If writing itself fails (disk full, no access etc), that too is a
different kind of error.

More specifically, the rules in the current XSLT 3.0 Last Call WD
Section 8.3 (http://www.w3.org/TR/xslt-30/#try-catch) are (items #3, #4
and #5):

"3. Some fatal errors arising in the processing environment, such as
running out of memory, may cause termination of the transformation
despite the presence of an xsl:try instruction. This is
implementation-dependent."

"4. If the sequence constructor or select expression of the xsl:try
causes execution of xsl:result-document, xsl:message, or xsl:assert
instructions and fails with a dynamic error that is caught, it is
implementation-dependent whether these instructions have any externally
visible effect. The processor is not required to roll back any changes
made by these instructions. The same applies to any side effects caused
by extension functions or extension instructions."

"5. A serialization error that occurs during the serialization of a
final result tree produced using xsl:result-document is treated as a
dynamic error in the evaluation of the xsl:result-document instruction,
and may be caught by a containing xsl:try instruction. A serialization
error that occurs while serializing the implicit final result tree
returned by the initial template is treated as occurring after the
transformation has finished, and cannot be caught."

And the Note below those rules:

"Note: If an error occurs while evaluating an instruction within
xsl:try, then no instruction within the xsl:try has any effect on the
result returned by the xsl:try instruction. This means that if a
processor is streaming the output to a serializer, it needs to adopt a
strategy such as buffering the output in memory so that nothing is
written until successful completion of the xsl:try instruction, or
checkpointing the output so it can be rolled back when an error occurs."

In XSLT 2.0 it was not possible (at least not without extensions) to
catch errors, so either the whole process failed, or it succeeded. In
XSLT 3.0, there is a significant difference between the initial implicit
final result tree and one for xsl:result-document as shown above. As you
can see, I/O errors and such can be considered "fatal errors", that are
outside the scope of the specification, though an implementation may
make certain environmental errors catchable.


> Writes happen sequentially, so there will certainly be incomplete
> intermediate file states which may be readable by other processes,
> depending on operating system rules. It seems to me that as a
> practical matter it doesn't make sense to spend a lot of time worrying
> about it, though, and it's not really a good justification for
> buffering on its own.

I agree that in most situations, there is not need for a stylesheet
writer to worry about buffering or not. An exception to that rule is in
the scenario by Roger, where the input stream is never-ending.

> I hope that streaming processors will feel free to write intermediate
> results as needed; perhaps a <xsl:flush /> instruction wouldn't be
> amiss, either :)

I doubt such an instruction will come, but you can always file a request
as a bug-entry against the W3C spec if you think it serves a strong
use-case. To me, it looks more like a counterpart to
saxon:discard-document and, as being part of the infrastructure, should
be in the realm of extension instructions. Or, as we are currently
considering this use-case for Exselt, as a commandline option.

Cheers,

Abel Braaksma
Exselt XSLT 3.0 streaming processor
http://exselt.net

Current Thread