Subject: Re: [xsl] Sibling axis: All of kind A up to first of kind B From: Michael Ludwig <mlu@xxxxxxxxxxxxx> Date: Wed, 26 Mar 2008 22:03:44 +0100 |
No, implementations are not required to be smart. I think that any respectable implementation(*) is likely to implement the predicate [1] with a modicum of intelligence, that is, to stop scanning when the first node is reached. Anything else is going to vary widely from one product to another.
(*) Actually, implementations are not required to be respectable either.
Thanks a lot for your answer. I submitted various respectable implementations to some naive benchmarking on my laptop equipped with a Intel(R) Core(TM)2 CPU T5500 @ 1.66GHz and running Linux.
* xsltproc (C): Using libxml 20627, libxslt 10120 and libexslt 813 * Xalan (C++): Xalan version 1.10.0, Xerces version 2.7.0 * Saxon (Java): /usr/local/jlib/saxon9-0-0-2/saxon9.jar Java HotSpot(TM) Client VM (build 1.6.0-b105, mixed mode, sharing)
The benchmark consisted of timing processing speed (using the Bash function "time") for relatively large input documents using each of the templates featured in my original post in this thread, the first one being identified by "upto", the second one by "siblrec".
The input documents have the structure "/doc/h1/h2/p". I tested several versions. In each case, there are 25 h1 elements. Then:
* in case (1): 25 h2 elements per h1, 25 p elements per h2; * in case (2): 50 h2 elements per h1, 50 p elements per h2; * in case (3): 75 h2 elements per h1, 75 p elements per h2. * in case (4): 100 h2 elements per h1, 100 p elements per h2. * in case (5): 1000 h2 elements per h1, 10 p elements per h2.
(1) 2.1 MB document, 16276 element nodes, 32551 text nodes (2) 8.4 MB, 63776 element nodes, 127551 text nodes (3) 19 MB, 142526 element nodes, 285051 text nodes (4) 34 MB, 252526 element nodes, 505051 text nodes (5) 34 MB, 275026 element nodes, 550051 text nodes
XML-wise, the output was the same regardless of processor or stylesheet. Just small variations with regard to indenting.
processor template first second third run --------- -------- ----- ------ --------- (1) xsltproc upto 0.29 0.29 0.29 xalan upto 5.44 5.40 5.43 saxon upto 1.19 1.20 1.31 xsltproc siblrec 0.57 0.59 0.58 xalan siblrec 0.95 0.94 0.96 saxon siblrec 1.11 1.11 1.12 --------- -------- ----- ------ --------- (2) xsltproc upto 1.38 1.39 1.42 xalan upto 45.04 44.40 44.49 saxon upto 2.45 2.42 2.56 xsltproc siblrec 4.31 4.46 4.16 xalan siblrec 6.30 6.35 6.33 saxon siblrec 2.09 2.01 1.97 --------- -------- ----- ------ --------- (3) xsltproc upto 3.65 3.71 3.73 xalan upto 149.44 151.50 150.25 saxon upto 4.77 4.75 4.87 xsltproc siblrec 13.38 13.39 12.92 xalan siblrec 20.07 20.08 19.94 saxon siblrec 3.39 3.36 3.43 --------- -------- ----- ------ --------- (4) xsltproc upto 8.76 8.82 8.74 xalan upto - - - not viable saxon upto 8.52 8.40 8.46 xsltproc siblrec 30.31 30.46 29.19 xalan siblrec 45.25 46.95 45.71 saxon siblrec 5.45 5.49 5.48 --------- -------- ----- ------ --------- (5) * see below *
I didn't really look at memory consumption. But for document (4), I had to allow "java -Xmx96M" in order for Saxon to avoid the "java.lang.OutOfMemoryError". LibXSLT used up to 160 MB, Xalan only about 30 MB, and Java/Saxon about 115 MB. So memory-wise, Xalan appears to do best.
It seems, xsltproc/LibXSLT prefers the "upto" template whereas Xalan and Saxon perform better using sibling recursion. Xalan even seems to dislike the "upto" approach. Saxon performs most evenly.
Consider that the timings for Saxon include JVM startup! And despite that overhead with regard to the C/C++ binaries of xsltproc and xalan, Saxon outperforms even LibXSLT for large inputs. Terrific! Saxon seems to scale best with regard to both input size and programming variations. (Well, at least for the two programs I fed it.)
I think this is likely to be more efficient. But with large input, it will run out of stack space (typically after 500 or so calls) unless the processor implements tail-call optimization, which not all do.
Michael Kay
Processing document (5) is difficult. It weighs 34 MB, has 275026 element nodes and 550051 text nodes. Having 1000 h2 nodes means the context size is much higher where most of the processing happens.
Running out of stack space didn't happen. It may happen, but I terminated some of the processes because they took too long.
(5) xsltproc upto 67.30 - - xalan upto - - - not viable saxon upto 33.17 - - xsltproc siblrec - - - not viable xalan siblrec - - - not viable saxon siblrec 5.80 5.76 5.81
It's interesting to see how processing speed and memory consumption do indeed - as you said - vary widely based on processor and programming method. I think these are all very good processors. From my tests, overall, your Saxon processor seems to perform best. But as stated, my tests are likely to be pretty naive.
PS: Here's the program I used to generate the input documents for testing - in case someone wants to follow up ... Simply pass it an input document like "<Urmel/>" - the input document isn't actually needed. You can parametrize document generation tweaking the @count attributes.
<xsl:transform version="1.0" xmlns:exsl="http://exslt.org/common" extension-element-prefixes="exsl" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/"> <doc> <xsl:variable name="start-elm" select="$bla/*[1]"/> <xsl:call-template name="dodo"> <xsl:with-param name="elm" select="$start-elm"/> <xsl:with-param name="count" select="$start-elm/@count"/> </xsl:call-template> </doc> </xsl:template>
<xsl:template name="dodo"> <xsl:param name="elm"/> <xsl:param name="count"/> <xsl:variable name="txt" select="$bla/*[local-name() = name($elm)]"/> <xsl:element name="{name($elm)}"> <xsl:value-of select="$txt"/> <xsl:text> </xsl:text> <xsl:value-of select="$txt"/> </xsl:element> <xsl:variable name="next-elm" select="$elm/following-sibling::*[1]"/> <xsl:if test="$next-elm"> <xsl:call-template name="dodo"> <xsl:with-param name="elm" select="$next-elm"/> <xsl:with-param name="count" select="$next-elm/@count"/> </xsl:call-template> </xsl:if> <xsl:if test="$count > 1"> <xsl:call-template name="dodo"> <xsl:with-param name="elm" select="$elm"/> <xsl:with-param name="count" select="$count - 1"/> </xsl:call-template> </xsl:if> </xsl:template>
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Sibling axis: All of kind, Michael Kay | Thread | Re: [xsl] Sibling axis: All of kind, Michael Ludwig |
RE: [xsl] Tokenized String Sorting , Scott Lynch | Date | Re: [xsl] Sibling axis: All of kind, Michael Ludwig |
Month |