## [xsl] Re: Re: Re: Re: Unbounded element grouping/concatenation

 Subject: [xsl] Re: Re: Re: Re: Unbounded element grouping/concatenation From: "Dimitre Novatchev" Date: Fri, 12 Dec 2003 11:50:33 +0100
```"Gupta, Raman K [CI]" <raman.k.gupta@xxxxxxxxxxxxx> wrote in message
news:C0CB45C72DDE2A4FA32370AB65CAEC8F059673@xxxxxxxxxxxxxxxxxxxxxxxxxx

[Nice timing results omitted]

> So all of the methods except recursion exhibit exponential
> behavior with an increase in continuation records except
> recursion.
>
> So if there is a way to do solve this problem recursively
> that would avoid the stack overflow errors (2.5.2 doesn't
> even do you the courtesy of throwing an exception -- it
> just dies in the middle of the transformation), that would
> still be by far the best solution.
>

Hi Raman,

I have both bad and good news for you.

The recursive algorithm isn't the fastest.

The Good News:
I am enclosing here the code implementing the fastest (that I
know of) algorithm, anf it is non-recursive.

The idea is to get a string with the positions of all "record" nodes with
type="normal" among all "record nodes", then for a record with type="normal"
find its position in the string and also the position of the next record
with type="normal" The difference in these two positions - 1 is the number
of intermediate record nodes with type="continuation".

Here's the XSLT code:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>

<xsl:output omit-xml-declaration="yes" indent="yes"/>

<xsl:variable name="vposArray">
<xsl:value-of select="'|'"/>
<xsl:for-each select="/*/record">
<xsl:if test="@type = 'normal'">
<xsl:value-of select="concat(position(), '|')"/>
</xsl:if>
</xsl:for-each>
</xsl:variable>

<xsl:template match="@* | node()" name="identity">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>

<xsl:template match="records">
<records>
<xsl:apply-templates select="record"/>
</records>
</xsl:template>

<xsl:template match="record">
<xsl:choose>
<xsl:when test="not(@type='normal')">
<xsl:call-template name="identity"/>
</xsl:when>
<xsl:otherwise>
<xsl:variable name="vPos" select="position()"/>
<xsl:variable name="vposNext"
select="substring-before(
substring-after(\$vposArray,
concat('|',
position(),
'|'
)
),
'|'
)"/>
<xsl:variable name="vNumNested"
select="\$vposNext - position() - 1"/>
<xsl:copy>
<xsl:copy-of select="@* | node()"/>
<xsl:if test="\$vNumNested > 0">
<xsl:copy-of select=
"following-sibling::record
[position() &lt;= \$vNumNested]"/>
</xsl:if>
</xsl:copy>
</xsl:otherwise>
</xsl:choose>
</xsl:template>

<xsl:template match="record[not(@type='normal')]"/>
</xsl:stylesheet>

I also measured the time it takes with the three different algorithms
(recursive, with keys, non-recursive) with two different sorce xml
documents -- one having 1000 consecutive record nodes with
type="continuation", the other with 2000 such nodes. They look like this:

<records>
<record n="1" type="normal"/>
<record n="2" type="normal"/>
<record n="3" type="continuation"/>
<record n="4" type="continuation"/>
<record n="5" type="continuation"/>
<record n="6" type="continuation"/>
<record n="7" type="continuation"/>
..............................................................
<record n="1000" type="continuation"/>
<record n="1001" type="continuation"/>
<record n="1002" type="continuation"/>
<record n="1003" type="normal"/>
<record n="1004" type="normal"/>
</records>

and

<records>
<record n="1" type="normal"/>
<record n="2" type="normal"/>
<record n="3" type="continuation"/>
<record n="4" type="continuation"/>
<record n="5" type="continuation"/>
<record n="6" type="continuation"/>
<record n="7" type="continuation"/>
..............................................................
<record n="2000" type="continuation"/>
<record n="2001" type="continuation"/>
<record n="2002" type="continuation"/>
<record n="2003" type="normal"/>
<record n="2004" type="normal"/>
</records>

I tested the following XSLT processors: MSXML4, Saxon6.5.3, JD1.5.2,
XalanJ2.4.1, XalanC1.5

I also wanted to test a forth algorithm using generate-id(). It took MSXML4
233157 milliseconds on the 1000 nodes document and I had to kill it after 30
minutes with the 2000 nodes document. I did not attempt to test this with
the rest of the XSLT processors.

Here are the results:

Method                    MSXML4       Saxon               JD
XalanJ            XalanC
=============================================================
Recursive:
1000 nodes                     37     Stack Ovfl.        150
Stack Ovfl.               170
2000 nodes                   519     Stack Ovfl.   Stack Ovfl.   Stack
Ovfl.               621

Keys:
1000 nodes                     44            1783           791
4677              1472
2000 nodes                 2211            6890         2654
17466              5818

Non-Recursive:
1000 nodes                    5.9                50           100
1152                   10
2000 nodes                12.09              101           120
1342                   40

The non-recursive algorithm exhibits linear behaviour with MSXML4 and Saxon
and sub-linear! one with JD, XalanJ and XalanC.

The recursive and key-based algorithm exhibit quadratic behaviour.

XalanJ is clearly not in the same company as the rest of the tested
processors.

I could try still speeding up the non-recursive algorithm, by using a faster
search than linear to find the position of a record node in the string with
positions -- this will require that all positions must have the same (some
maximum) length. Or I could record the positions in a node-set, for which
binary search is straight-forward.

In case you are still not satisfied with the speed of the non-recursive
algorithm, just let me know :o)

Hope this helped.

=====
Cheers,

Dimitre Novatchev.
http://fxsl.sourceforge.net/ -- the home of FXSL

XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

```