RE: [xsl] Dynamic pipelining in XSLT 2.0 w/ Saxon extensions

Subject: RE: [xsl] Dynamic pipelining in XSLT 2.0 w/ Saxon extensions
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Tue, 19 Jun 2007 10:18:03 -0400
Hi Mike,

At 03:58 AM 6/19/2007, you wrote:
> * This runs (tested under the current Saxon 8.9), but how
> will it scale? In particular, Mike Kay may be able to say
> whether compiled stylesheets are be cached when this is run
> over a set of documents.
> If not, wouldn't compiling each stylesheet anew for each
> input document be an impediment?

I'm not sure of the architecture you are using. If you start a new
"transformation" to process each "set of documents", then the variables
associated with the transformation (which include any compiled stylesheets)
will of course be lost. On the other hand, if you process multiple "sets of
documents" within a single transformation, then the problem is that the
documents will be held in memory unless you explicitly discard them using

In testing so far, I've invoked this by naming a subdirectory as input from the command line. If I'm reading your analysis correctly, then, the first problem doesn't obtain (the pipelining stylesheet is invoked once, not repeatedly), but the second does?

I've used saxon:discard elsewhere, however, so I suppose that could be rolled in as well, to ameliorate that problem.

At present certain things are never "early-evaluated" in Saxon, even if all
the arguments are known at compile time. These include the doc() function
and extension functions - the theory being that the results of these calls
might depend on external information that changes between compile time and
run-time. So a variable such as <xsl:variable name="process"
select="saxon:compile-stylesheet(doc($stylesheet))"/>, even if promoted to
be a global variable, would be evaluated anew on each transformation. It
would be nice to provide additional options in this area - not just for this
use case, but a more general capability.
> * Are there any obvious pitfalls or problems with this
> approach? (Or any not so obvious?) How does it compare to
> other methods?

I'm inclined to think that a general purpose pipeline processor will do the
job better. It's likely to have memory management that's better adapted to
this kind of work, and debugging facilities to examine the documents at any
stage of the pipeline or to switch validation of intermediate steps on and
off, etc. If you're lucky it might even allow distributed or asynchronous
execution of the pipeline.

No doubt. But if I understand it correctly, the major alternative currently on the open source side, Ant, will only pipeline files -- it's not really a "pipeline processor" in the sense you mean.

Using saxon:next-in-chain, we can pipeline using SAX events, but that's a poor man's solution and not as flexible, especially when running a sequence of more than three or four steps.

Maybe there's a reader who has an alternative to propose, which can (a) support specifying the pipeline as flexibly as this, (b) chain temporary trees like this or something even faster, and (c) scale well?

In the meantime I guess we're waiting for XProc implementations.

Thanks very much for your comments!


Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.      
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
  Mulberry Technologies: A Consultancy Specializing in SGML and XML

Current Thread