[xsl] Performance improvement for a recursive function?

Subject: [xsl] Performance improvement for a recursive function?
From: Manfred Staudinger <manfred.staudinger@xxxxxxxxx>
Date: Tue, 13 Dec 2011 01:07:06 +0100

I need some help to improve the performance of a function, as I have
run out of ideas what to change!

In the process to convert some data to SVG I transform (or even break
up) @Data attribute of the Path element and then use the function
my:compose-path-data [1] to put the parts together into a string
again. The attribute string is a succession of path commands, where
each path command  (see the global variable below) consists of a one
byte character followed by zero, one, or two integers. Having about
2200 Path elements, most of them have a @Data attribute with between 5
an 50 path commands.

A few attribute strings are bigger (up to 1MB) and contain up to
100000 path commands. As the function is called for each path command
it is tail recursive, but the performance is still a big problem: to
process 41337 path commands it takes 861 sec! The first 5000 path
commands are processed in 14 sec (at 355.87 per sec) the last 5000
take 671 sec (only at 7.5 per sec).

What Saxon reports [2] is consistent with the above. In case you want
to try it, I have uploaded the specific test case here:

Using Saxon-HE and Java version 1.6.0_14 on Windows. XP Home.
Hardware ASUS Eee PC 1000H with Intel. Atom N270 (1.60 GHz)


<xsl:function name="my:compose-path-data" as="xs:string*">
   <xsl:param name="c" as="xs:integer*"/>
   <xsl:param name="i-c" as="xs:integer"/>
   <xsl:param name="xy" as="xs:integer*"/>
   <xsl:param name="i-xy" as="xs:integer"/>
   <xsl:param name="i-end" as="xs:integer"/>
   <xsl:param name="result" as="xs:string?"/>
   <xsl:variable name="cmd" select="key('path', $c[$i-c], $path-cmd)"
   <xsl:sequence select="if ($i-c gt $i-end)
      then $result
      else my:compose-path-data($c, $i-c + 1,
         $xy, $i-xy + xs:integer($cmd/n),
         concat($result, $cmd/@char, if ($cmd/n=2)
            then (: M, m, l, (blank) :) concat($xy[$i-xy], ',', $xy[$i-xy+1])
            else  if ($cmd/n=1)
               then (: h, v :) string($xy[$i-xy])
               else (: z :) ())

and on the stylesheet level
<xsl:variable name="path-cmd">
      <cmd char=" "><n>2</n><code>32</code></cmd>
      <cmd char="M"><n>2</n><code>77</code></cmd>
      <cmd char="l"><n>2</n><code>108</code></cmd>
      <cmd char="m"><n>2</n><code>109</code></cmd>
      <cmd char="h"><n>1</n><code>104</code></cmd>
      <cmd char="v"><n>1</n><code>118</code></cmd>
      <cmd char="z"><n>0</n><code>122</code></cmd>
<xsl:key name="path" match="cmd" use="xs:integer(code)"/><!-- $path-cmd -->

Stylesheet compilation time: 4218 milliseconds
Using parser org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser
Tree built in 1375 milliseconds
Tree size: 4 nodes, 0 characters, 11 attributes
Execution time: 14m 31.579s (871579ms)
Memory used: 36559128
NamePool contents: 33 entries in 33 chains. 8 prefixes, 8 URIs

Current Thread