Re: [xsl] memory usage of xslt processing

Subject: Re: [xsl] memory usage of xslt processing
From: JAPISoft <public2@xxxxxxxxxxxx>
Date: Wed, 19 Apr 2006 15:29:19 +0200
Hello Michael,

Should it not depend on the XPath expressions from your XSLT ?

If I use "//*" for a document fragment, what possibility could be ?

I was thinking if a tool that could analysis the XPath expressions from an XSLT document and could create a kind of
graph nodes with the scope of the expressions could have a sens ?


Best regards,

A.Brillant


----- Original Message ----- From: "Michael Kay" <mike@xxxxxxxxxxxx>
To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Wednesday, April 19, 2006 2:59 PM
Subject: RE: [xsl] memory usage of xslt processing



XSLT processors generally read the whole document into memory. Some products
may be able to avoid this under certain circumstances, for example see
http://www.saxonica.com/documentation/sourcedocs/serial.html for Saxon.


Running one transformation per row is certainly feasible in principle though
there may be a significant start-up overhead - you'll only find out by
measurement.


Alternatively, why not retrieve the data from the database in
transformer-sized chunks?

Michael Kay
http://www.saxonica.com/

-----Original Message-----
From: Thomas Porschberg [mailto:thomas.porschberg@xxxxxxxxx]
Sent: 19 April 2006 13:36
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: [xsl] memory usage of xslt processing

Hi,

I have the following task:
Create an arbitrary formatted file (XML/HTML/CSV whatever)
based on a Select from a database.

As a constraint the amount of data fetched from the database
can not be stored in memory as a whole.
Another constraint is that I can not use XML-functionality in
the database, I have to implement the functionality on top of
our database access framework. This database access framework
fetches record for record one after another.
And I have to use Java and Xalan.

My idea was to decorate every fetched row from the database
with simple generic XML and fire this to Xalan.

Let do an example:
If my result set from the database looks like:

ID  Name  Description
--  ----  -----------
1  "dog"  "an animal may be dangerous"
2  "cat"  "an animal likes milk"

I create the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<dataset>
 <row>
  <value>1</value>
  <value>dog</value>
  <value>an animal may be dangerous</value>  </row>  <row>
  <value>2</value>
  <value>cat</value>
  <value>an animal likes milk</value>
 </row>
</dataset>

I create this XML as "Sax fire events" in an java
class[StringArrayXMLReader], which implements the
org.xml.sax.XMLReader interface.
I have three methods:

public void init() throws SAXException {
        ch.startDocument(  );
        ch.startElement("","dataset","dataset",EMPTY_ATTR);
}

public void close() throws SAXException {
        ch.endElement("","dataset","dataset");
        ch.endDocument(  );
}

public void parse(String [] input) throws SAXException {
        ch.startElement("","row","row",EMPTY_ATTR);
        for (int i = 0; i< input.length; ++i){
           ch.startElement("","value","value",EMPTY_ATTR);
           ch.characters(input[i].toCharArray(),
0,input[i].length(  ));
           ch.endElement("","value","value");
       }
       ch.endElement("","row","row");
}

The parse method creates the <row>...</row> entries for an
overhanded String array.
The StringArrayXMLReader is associated with a
TransformerHandler, which uses a XSL stylesheet to transform
the XML to the desired output.

What happens here is, that when the fetch from the database
starts I call init() ( and thus startDocument() ) and at
last, after the fetch finished, I call close() (and thus
endDocument()).
I observed that the xslt processing starts when endDocument()
is called.
This is not acceptable for me because I fear the xslt
processor reads all the rows into memory until endDocument()
is called and in this case I take a risk to run in OutOfMemory.

My second idea was to eliminate the init()/close() methods
and to consider one <row>...</row> section as complete
document input for the processor. This has the disadvantage
that I have to create the head and tail of the document
manually (and in my example I get a NullPointerException when
I the transformer is called twice).

I have the following questions:
Is it possible to create the output without having the whole
data in memory ?
The basis XML for xslt processing
<dataset>
  <row><value>...
  <row><value>...
</dataset>
looks very simple and the supplied XLS stylesheets will be
not complex so my hope is to get it working.
I also think that the task in general - produce formatted
output from a potential very large data pool - should be a common one.
Unfortunately I did not do much xslt-processing in the past
so I lack the experience (a bit libxslt which I feed a DOM tree).
If someone has some striking links I would very glad to hear.
My test code I provide at:

http://randspringer.de/sax_row.tar and
http://randspringer.de/sax.tar

If someone could have a look at it I would really appreciate it.

Thomas


--

Current Thread