Re: [xsl] Cannot write more than one result document to the same URI

Subject: Re: [xsl] Cannot write more than one result document to the same URI
From: Michael Kay <mike@xxxxxxxxxxxx>
Date: Fri, 5 Apr 2013 09:08:18 +0100
> The problem is that the specification does not require the XSLT processor to
complete the processing of the first <b> before starting or even ending the
processing of the second <b>.  Sure a single-process implementation "X" likely
would.  But a parallelized (is that a word?) implementation "Y" running on
multiple CPUs could very well fully process the second <b> before the first
<b> if it chose to do so.  Its only obligation is to arrange the resulting
tree with the result of processing the first <b> before the result of
processing the second <b>.  This obligation ensures that the result of
processing by "X" is identical to the result of processing by "Y".  But there
is no obligation on what the processor does to get to that result.
>
> When using <xsl:result-document> the processor is not building the result
tree.  It is creating a completely separate result.  If the instruction
required "re-opening" of the file for append, processor "X" likely would
produce the expected result, but processor "Y" in the situation above would
produce an unexpected result.  Two processors would produce two results.
>
> And this is also why one cannot assert that the writing to the file is even
finished before the next attempt to write to the file starts.  The file handle
could very well still be left open by one parallel process when the other is
ready to open it for itself.  So it can't be used even if the file is opened
for write and not for append.
>
Indeed. Saxon-EE 9.5 will execute xsl:result-document instructions
asynchronously, so the rule in the spec that you can't write two documents to
the same URI turns out to be very useful. If you do something like this:

<xsl:for-each select="employee">
 <xsl:result-document href="{@ssn}">
   <xsl:copy-of select="."/>
 </xsl:result-document>
</xsl:for-each>

then you might well have a dozen threads operating at once, each copying a
different employee element to a different result file. If the URIs were not
unique, this would cause havoc - in fact the optimization would not really be
possible.

I can see why you find the rule irritating - I've been in the same situation
myself - but it's there for a very good reason.

And by the way, your mental model that the result file is closed when the
xsl:result-document end tag is encountered might be a convenient way of
thinking about things, and perhaps not even too far from reality, but it's not
the way the semantics of the language work, and sooner or later it will lead
to difficulties in understanding what's going on. It's a bit like imagining
that when you do readFile('xyz') in Java, the file is closed when the closing
')' is encountered.

Michael Kay
Saxonica

Current Thread