Re: Practical Suggestion for XSLT Performance Improvement

Subject: Re: Practical Suggestion for XSLT Performance Improvement
From: "Clark C. Evans" <clark.evans@xxxxxxxxxxxxxxxxxxxx>
Date: Thu, 7 Oct 1999 17:40:10 -0400 (EDT)
Chris,

Thank you for your feedback.  ;)

Your perspective seems to come from XML document 
editing.  In this domain, access to the (entire?) source 
document is required as it is being altered.  

My perspective is from back-end stream processing.
In this domain, the source document is sequential
and temporal -- the goal is to "touch-it-once".  

On Thu, 7 Oct 1999, Christopher R. Maden wrote:
> Well, not hurt, necessarily, but not be usable with.  
> Both DSSSL and XSLT were careful to avoid side effects 
> so that processing could start at any point and still 
> have the same results.

Yes, this I realize; it makes sense.  It is also why I was 
proposing something _other_ than mutable variables and why
I spent great length quoting Abelson and doing my homework.

> In an editor, for example, if a user alters a document at one 
> point - let's say they insert a new element -  the stylesheet can 
> be re-applied locally to get a close approximation of the result 
> of rendering the entire document. 

This is true _only if_ the stylesheet author sticks to 
<template match="..." /> constructs.  As soon as the stylesheet 
author writes any significant portion of the document with a 
<for-each select="$complicated-xpath-expression" /> construct,
you have lost the war.

The problem with the <for-each select="$complicated-xpath" /> 
construct is that it is iterative and procedural -- it
bypasses that beautiful recursive template matching mechanism.
Therefore, the resulting stylesheet fragments are much harder
to moduleize ... I'll get to this later.

The pipe construct is meant as a replacement for the 
<for-each select="complicated-xpath" /> construct; so in 
this domain, locality of change has already been lost.

> Similarly, in a browser, if a user requests a fragment of a 
> document, there's no reason that the browser has to render 
> everything leading up to the beginning of the fragment, at   
> least not right away.

Ok.  But a dependency graph (or something similar) needs to be 
generated by the XSL parser before this rendering occurs.
 
> With named pipes, this means that the children will be 
> rendered correctly,

Exactly; so we are still good to go... so far.

> then the pipes haven't been created and the children 
> won't write to them.  When the parent is processed (possibly 
> after the children), the pipe won't be there for proper 
> processing to occur.

Sorry about my clumsy/mis-leading wording.  This need not be 
the case; see Stevens, Network Programming Vol II, p54.

> It's true that to get precise rendering, you have to start 
> at the root rule and evaluate other templates only when explicitly 
> instructed to do so by an <xsl:apply-templates/> or such.  

Yep.  And this is always the case for back-end processing.

> However, even in that environment, a fast processor might jump 
> ahead and prepare child result nodes for use by their parents, 
> and that's prevented by the pipes approach.

How so?  If a parent were to read from a named pipe without the 
child being processed, then it would block.  This is what would 
happen in a "smart-DOM" anyway... so, it is no worse than the
current approach.

> And I disagree that the pipes approach would permit disposition 
> of the input document immediately;

This is one of the goals for back-end processing: dispatch
with information that will not be used in the future.  Let's
talk about this one... because I'm positive that the pipes 
approach would give a huge savings for back-end processors.

> only in very limited cases are there never going to be 
> back-references in the data,

Correct.  For instance, in my weekly-time-sheet document type
there is a header with person-name and the week-ending-date.
Being able to //person-name or //week-ending-date is very powerful.

However, the bulk of these back-references are to "ancestors",
while the bulk of the processed memory is wasted on "previous-siblings". 

> and  asking a stylesheet author to explicitly account for 
> every possible one is unreasonable. 

Obviously you would have to put parents and other ancestors
on a "node stack".  And other "obvious" nodes that have 
references to them can be put in memory as well.

ON the other hand, if child information needs to be accessible 
by the parent, the story is different entirely.  In this case,
I *do* feel it is reasonable to ask the stylesheet author
to identify such information and put it on an up-stream pipe.

In fact, not only is it reasonable, it would be great practice 
to do this.  The act of marking information as usable by a 
parent (by putting it in the pipe) makes for a clear documented
interface.   Currently, we have children accessing parents and 
parents accessing children... in short, it's a great recipe 
for sticky spaghetti.

Furthermore, if complicated/dynamic xpath expressions are not used, 
then the xsl processor has a damn good chance of knowing which 
"previous-siblings" can be safely discarded.

> Your view of your model as a sort of assembly model, 

I did write that; but after thinking more about it I'm
going to retract that idea -- sorry to confuse the issue.

The focus is two fold:

(1) Efficiency 
 
  The right of the processor to discard information that will 
  not be used in the future (touch-it-once).  

(2) Modularity

  Allowing child templates and parent templates to vary 
  independently not only in location, but also in structure.

In both of these categories, the for-each(complicated-xpath-expression)
is poor at delivering the goods -- and named pipes would excel.

Is this caffeine without the jitters?  nope.  With this power comes 
synchronization issues; however, these issues are well known and 
I think they are well worth the benefits.

Hope this explains better,

Clark







 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread