Re: [xsl] Max size?

Subject: Re: [xsl] Max size?
From: "J.Pietschmann" <j3322ptm@xxxxxxxx>
Date: Thu, 09 Jan 2003 15:05:18 +0100
Michael Kay wrote:
I can't speak with any authority about Xalan, but my understanding is
that it builds the tree concurrently with doing the transformation; by
the time you've finished, you will normally have the complete tree in
memory. This has benefits, but the memory you need still increases
linearly with document size.
I don't think so. You can process much larger inputs once
incremental processing is enabled, and the style sheet
fits, without running out of memory. Well, it still runs
out of memory ultimately, but I don't think it's only
streaming the result, significant parts of information
associated with the input seems to be discarded too.

The interesting challenge is to work out when you can discard parts of
the tree that won't be needed again. I think this could be done quite
easily for a small class of very simple stylesheets, but the general
problem is quite hard.

I think it should be possible to assert by static analysis whether a certain template only accesses descendants of the context node. If this can be asserted for all templates in the style sheet, and if you can arrange the processing within a template so that nodes are only accessed once locally, you can discard nodes processed by directly called templates from memory. Making such assertions shouldn't be that hard if the XPath expressions within the templates use only nodes form the descendant-or-self axis. It may be an indication that Xalan's memory usage increases quite a bit for the same input if the stylesheet uses a sibling axis somewhere, even if the same result is produced. One of the more interesting questions: If you have a schema for the input and can afford to barf in mid-processing if the input doesn't validate, the structure information should allow far better assertions from static analysis.

J.Pietschmann


XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list



Current Thread