RE: [xsl] memory usage of xslt processing

Subject: RE: [xsl] memory usage of xslt processing
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Wed, 19 Apr 2006 13:59:08 +0100
XSLT processors generally read the whole document into memory. Some products
may be able to avoid this under certain circumstances, for example see
http://www.saxonica.com/documentation/sourcedocs/serial.html for Saxon.

Running one transformation per row is certainly feasible in principle though
there may be a significant start-up overhead - you'll only find out by
measurement.

Alternatively, why not retrieve the data from the database in
transformer-sized chunks?

Michael Kay
http://www.saxonica.com/ 

> -----Original Message-----
> From: Thomas Porschberg [mailto:thomas.porschberg@xxxxxxxxx] 
> Sent: 19 April 2006 13:36
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] memory usage of xslt processing
> 
> Hi,
> 
> I have the following task:
> Create an arbitrary formatted file (XML/HTML/CSV whatever) 
> based on a Select from a database.
> 
> As a constraint the amount of data fetched from the database 
> can not be stored in memory as a whole.
> Another constraint is that I can not use XML-functionality in 
> the database, I have to implement the functionality on top of 
> our database access framework. This database access framework 
> fetches record for record one after another.
> And I have to use Java and Xalan.
> 
> My idea was to decorate every fetched row from the database 
> with simple generic XML and fire this to Xalan.
> 
> Let do an example:
> If my result set from the database looks like:
> 
> ID  Name  Description
> --  ----  -----------
> 1  "dog"  "an animal may be dangerous"
> 2  "cat"  "an animal likes milk"
> 
> I create the following XML:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <dataset>
>  <row>
>   <value>1</value>
>   <value>dog</value>
>   <value>an animal may be dangerous</value>  </row>  <row>
>   <value>2</value>
>   <value>cat</value>
>   <value>an animal likes milk</value>
>  </row>
> </dataset>
> 
> I create this XML as "Sax fire events" in an java 
> class[StringArrayXMLReader], which implements the 
> org.xml.sax.XMLReader interface.
> I have three methods:
> 
> public void init() throws SAXException {
>         ch.startDocument(  );
>         ch.startElement("","dataset","dataset",EMPTY_ATTR);
> }
> 
> public void close() throws SAXException {
>         ch.endElement("","dataset","dataset");
>         ch.endDocument(  );
> }
> 
> public void parse(String [] input) throws SAXException {
>         ch.startElement("","row","row",EMPTY_ATTR);
>         for (int i = 0; i< input.length; ++i){
>            ch.startElement("","value","value",EMPTY_ATTR);
>            ch.characters(input[i].toCharArray(), 
> 0,input[i].length(  ));
>            ch.endElement("","value","value");
>        }
>        ch.endElement("","row","row");
> }
> 
> The parse method creates the <row>...</row> entries for an 
> overhanded String array.
> The StringArrayXMLReader is associated with a 
> TransformerHandler, which uses a XSL stylesheet to transform 
> the XML to the desired output.
> 
> What happens here is, that when the fetch from the database 
> starts I call init() ( and thus startDocument() ) and at 
> last, after the fetch finished, I call close() (and thus 
> endDocument()).
> I observed that the xslt processing starts when endDocument() 
> is called.
> This is not acceptable for me because I fear the xslt 
> processor reads all the rows into memory until endDocument() 
> is called and in this case I take a risk to run in OutOfMemory.
> 
> My second idea was to eliminate the init()/close() methods 
> and to consider one <row>...</row> section as complete 
> document input for the processor. This has the disadvantage 
> that I have to create the head and tail of the document 
> manually (and in my example I get a NullPointerException when 
> I the transformer is called twice).
> 
> I have the following questions:
> Is it possible to create the output without having the whole 
> data in memory ?
> The basis XML for xslt processing
> <dataset>
>   <row><value>...
>   <row><value>...
> </dataset>
> looks very simple and the supplied XLS stylesheets will be 
> not complex so my hope is to get it working.
> I also think that the task in general - produce formatted 
> output from a potential very large data pool - should be a common one.
> Unfortunately I did not do much xslt-processing in the past 
> so I lack the experience (a bit libxslt which I feed a DOM tree). 
> If someone has some striking links I would very glad to hear. 
> My test code I provide at:
> 
> http://randspringer.de/sax_row.tar and
> http://randspringer.de/sax.tar
> 
> If someone could have a look at it I would really appreciate it.
> 
> Thomas
> 
> 
> -- 

Current Thread