Re: [xsl] how to estimate speed of a transformation

Subject: Re: [xsl] how to estimate speed of a transformation
From: David Tolpin <dvd@xxxxxxxxxxxxxx>
Date: Wed, 10 Dec 2003 21:06:29 +0400 (AMT)
> On Wed, Dec 10, 2003 at 05:35:13PM +0400, David Tolpin wrote:
> > How can I ensure that differences between processors cause linear changes in speed? How
> > can I predict space consumtion?
> 
>   You can't ! Well not in a reasonable way IMHO:
> 
> Processors like Saxon and jd use the fact that their implementation
> language is garbage-collected to reuse nodes in things like node-set()
> implementation. xsltproc usually can't do this kind of things. this
> also mean you cannot predict the cost of an stylesheet execution
> because you would have to model the cost of the garbage collection
> which may occur asynchronously, during the transform, after the transform
> or any combination of those depending on the complexity of the stylesheet,
> the size of the data, the memory pressure inside the JVM, etc ...

Whether an implementation is garbage-collected does not depend on the fact
whether it is written in Java or not. Node re-use can as well be implemented
in C, B, Pascal, or any other language. Besides, GC is not a magic property of Java.

GC has little relation to optimizations advanced processors perform.
jd.xslt and saxon display consistent results under all versions (1.1.8 to 1.3.1)
of JDK available to me; however, they have limited support for tail-recursion,
that is, only 'simple' cases are optimized correctly. On the other hand, they
perform expressions optimizations rooted in immutability of data, but I cannot
find any source besides the source code that describes which optimizations
are performed exactly.

Memory use is also implemented differently, not in terms of Java GC, but in terms
of creation and allocation of temporary objects. 

> The estimation of the complexity cannot simply be derived from a simple
> parsing and analysis of the stylesheet. It depends on the processor
> implementation, the implementation of the JVM if using one, and on the
> version of the processor used.

For XSLT, as well as for most programming languages, it is possible 
to estimate complexity for a non-optimizing implementation; such as 
James Clark's XT, which is very fast but does not do much beyond verbose
interpretation of the transformation.

The fact is that, due to restrictions of the data model of XSLT, there
are several optimizations which are always possible and are caused not
by sophistication of an underlying layer, but by features of the language
itself, that is, XSLT.

My questions are following: 

- what are these optimizations?
- how these optimizations affect computation complexity for certain operations?
- how 'much' of the 'mandatory' optimizations is implemented in each of widely used 
  implementations?

These questions have no relation to either implementation language or execution platform.
They should have their answers regardless of the implementation media.

Having answers to these questions would help program advanced algorithms adequately,
without recourse to testing with each of a dozen of available implementations. In
particular, while writing stylesheets with only a basic implementation in mind helps
ensure that a stylesheet behaves adequately (in terms of memory usage and speed) in
the worst case, knowing which expressions are in fact evaluated recurrently and which
data structures are re-used can help in constructing programs that benefit from optimizations
when the optimizations are available. Again, the optimizations of this class should not
be an 'art', they can be specified formally and in fact are features of the language.

David Tolpin
http://davidashen.net/

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread