Subject: Re: [xsl] Incremental transformations with Xalan and performance issues? From: Marian Olteanu <mou_softwin@xxxxxxxxx> Date: Sat, 4 Dec 2004 22:46:42 -0800 (PST) |
--- Andrzej Jan Taramina <andrzej@xxxxxxxxxxx> wrote: > I'm in a situation where I need to parse some large documents, where the > first few elements are a preamble with various parameters and the end of the > document is a large list of entries. > > Think of a mail merge, where the letter to be sent is defined first in the > mail merge xml, followed by numerous recipient entries, something like this: > > <mailmerge> > <letter> > ...letter def goes here > <letter> > <recipients> > <recipient> > ...recipient data > </recipient> > <recipient> > ...recipient data > </recipient> > etc... > </recipients> > <mailmerge> > > What I was wondering was how Xalan handles the processing of such large > documents (say a million recipient entries) when the parser is using SAX? > > More specifically, if I create global variables such as: > > <xsl:variable name="letterTemplate" select="/mailmerge/letter"/> > > then later: > > <xsl:template match="recipients/recipient> > <!-- process the recipient using $letterTemplate --> > </xsl:template> > > Will the processing be incremental in nature, as SAX events are received by > Xalan? That is, is Xalan smart enough to create the global as soon as it > can, followed by processing of each individual recipient as each related SAX > event is received? In that case, having the shared global info early in the > document and the large list at the end would probably have beneficial > performance implications. > > Or will the whole document have to be instantiated as some sort of internal > tree first? > > Hopefully, it's incremental in nature, since otherwise we might blow out > memory with such large documents. > > Any insight into the implications of processing such large documents, using > globals, xslt stylesheet structure, impact of element ordering in the > document and the like would be very much appreciated. > > Thanks! > First of all, my experience says that if you are concerned about performance, stay away from Xalan. I must admit that I wasn't concerned about XSLT and speed since Summer of 2002 (when school made me work at a XSLT compiler (in which I was focused about speed, but not about incremental parsing :-D , because I didn't really find a good application for it)) and testing different processors I got the following results: AXXEL/1 AXXEL/3 XSLTC XALAN MSXML4 MSXML3 SAXON Mo.xsl 1352 3155 2564 61950 2379 10451 3985 Sh.xsl 250 1713 *** 6205 655 1787 681 n-s.xsl 1041 1321 1201 4897 1065* 2243 2825 * = wrong output *** = coundn't compile Processors: AXXEL/1 - my project: XSLT compiled to Java sourcecode, output fully suppressed (JVM) AXXEL/3 - my project: XSLT compiled to Java sourcecode, with output (JVM) XSLTC - XSLT to Java bytecode, found in Xalan (JVM) SAXON - SAXON 6.5.2 (JVM) XALAN - XALAN 2.3.1 (JVM) MSXML3 - Microsoft MSXML 3.0 MSXML4 - Microsoft MSXML 4.0 Tests: mo.xsl - a XML2HTML presentation sheet, fairily complex (a lot of templates and a lot of modes). Artificially run 100 times (the main template: run the stylesheet 100 times, without re-parsing of the input XML) sh.xsl - a XML2HTML presentation sheet, quite simple. Run internally 100 times, except for MSXML3 and MSXML4 (I don't remember why, but it didn't work) for which the time for executing once was multiplied by 100 n-s.xsl (number-string.xsl) - an artificial stylesheet, to test the computation power for the string value of a node (i.e: how fast you compute string(/) ), the speed of normalize-space. For Java processors, JDK 1.4.0 was used (HotSpot client). The time was computed after the hot spot compiler did its job (simulation of server-side environment) . I must admit, tests were performed with mid-2002 software, but as you can see, Xalan is way worst than anything else tested, MSXML 4.0 works great (written in C++) and SAXON is very close behind (although it is written in Java). Xalan was 10 to 15 times slower than SAXON (on real stylesheets). What I also found out is that Java is not great at I/O in XSLT transformation: file manipulation and string manipulation is quite slow. Maybe the things have changed changed in 2.5 years, but I doubt that people from Apache foundation learned how to write fast software. Latest release of Xalan is 2.6 and latest releas of Saxon is 8.1.1. Still, latest release of MSXML is 4.0. I also bet that they didn't change much in XSLTC About the big XMLs issue: I recomend you not to expect any magic from a XSLT processor (like efficient incremental parsing) and make all your XMLs small by dividing the information into more than an XML (which later you can access them using "document" function). For example, you may take the mail content into a separate XML file if you don't access this info too often. In my experience, any XML over 3 or 5 MB is a bad XML. More, don't expect that after you used an external XML (using "document" function) and you have no refference to it any more, the XSLT processor will free the XML tree for that external XML. ===== Marian http://www.utdallas.edu/~mgo031000/ __________________________________ Do you Yahoo!? Yahoo! Mail - Find what you need with new enhanced search. http://info.mail.yahoo.com/mail_250
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] Incremental transformations w, Andrzej Jan Taramina | Thread | [xsl] using date and duration, Bruce D'Arcus |
Re: [xsl] DOCTYPE in the XSL file, Niclas Hedhman | Date | RE: [xsl] HOW XSLT WILL BE, WHEN XM, Marian Olteanu |
Month |