[xsl] Re: Re: Re: Re: Unbounded element grouping/concatenation

Subject: [xsl] Re: Re: Re: Re: Unbounded element grouping/concatenation
From: "Dimitre Novatchev" <dnovatchev@xxxxxxxxx>
Date: Fri, 12 Dec 2003 11:50:33 +0100
"Gupta, Raman K [CI]" <raman.k.gupta@xxxxxxxxxxxxx> wrote in message
news:C0CB45C72DDE2A4FA32370AB65CAEC8F059673@xxxxxxxxxxxxxxxxxxxxxxxxxx

[Nice timing results omitted]

> So all of the methods except recursion exhibit exponential
> behavior with an increase in continuation records except
> recursion.
>
> So if there is a way to do solve this problem recursively
> that would avoid the stack overflow errors (2.5.2 doesn't
> even do you the courtesy of throwing an exception -- it
> just dies in the middle of the transformation), that would
> still be by far the best solution.
>

Hi Raman,

I have both bad and good news for you.

The Bad News:
              The recursive algorithm isn't the fastest.

The Good News:
              I am enclosing here the code implementing the fastest (that I
know of) algorithm, anf it is non-recursive.


The idea is to get a string with the positions of all "record" nodes with
type="normal" among all "record nodes", then for a record with type="normal"
find its position in the string and also the position of the next record
with type="normal" The difference in these two positions - 1 is the number
of intermediate record nodes with type="continuation".

Here's the XSLT code:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
 >

  <xsl:output omit-xml-declaration="yes" indent="yes"/>

  <xsl:variable name="vposArray">
    <xsl:value-of select="'|'"/>
    <xsl:for-each select="/*/record">
      <xsl:if test="@type = 'normal'">
        <xsl:value-of select="concat(position(), '|')"/>
      </xsl:if>
    </xsl:for-each>
  </xsl:variable>

  <xsl:template match="@* | node()" name="identity">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="records">
    <records>
      <xsl:apply-templates select="record"/>
    </records>
  </xsl:template>

  <xsl:template match="record">
    <xsl:choose>
      <xsl:when test="not(@type='normal')">
        <xsl:call-template name="identity"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:variable name="vPos" select="position()"/>
        <xsl:variable name="vposNext"
        select="substring-before(
                          substring-after($vposArray,
                                          concat('|',
                                                 position(),
                                                 '|'
                                                 )
                                          ),
                          '|'
                                )"/>
         <xsl:variable name="vNumNested"
           select="$vposNext - position() - 1"/>
         <xsl:copy>
           <xsl:copy-of select="@* | node()"/>
           <xsl:if test="$vNumNested > 0">
             <xsl:copy-of select=
               "following-sibling::record
                           [position() &lt;= $vNumNested]"/>
           </xsl:if>
         </xsl:copy>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <xsl:template match="record[not(@type='normal')]"/>
</xsl:stylesheet>


I also measured the time it takes with the three different algorithms
(recursive, with keys, non-recursive) with two different sorce xml
documents -- one having 1000 consecutive record nodes with
type="continuation", the other with 2000 such nodes. They look like this:

<records>
   <record n="1" type="normal"/>
   <record n="2" type="normal"/>
   <record n="3" type="continuation"/>
   <record n="4" type="continuation"/>
   <record n="5" type="continuation"/>
   <record n="6" type="continuation"/>
   <record n="7" type="continuation"/>
 ..............................................................
   <record n="1000" type="continuation"/>
   <record n="1001" type="continuation"/>
   <record n="1002" type="continuation"/>
   <record n="1003" type="normal"/>
   <record n="1004" type="normal"/>
</records>

and

<records>
   <record n="1" type="normal"/>
   <record n="2" type="normal"/>
   <record n="3" type="continuation"/>
   <record n="4" type="continuation"/>
   <record n="5" type="continuation"/>
   <record n="6" type="continuation"/>
   <record n="7" type="continuation"/>
 ..............................................................
   <record n="2000" type="continuation"/>
   <record n="2001" type="continuation"/>
   <record n="2002" type="continuation"/>
   <record n="2003" type="normal"/>
   <record n="2004" type="normal"/>
</records>


I tested the following XSLT processors: MSXML4, Saxon6.5.3, JD1.5.2,
XalanJ2.4.1, XalanC1.5

I also wanted to test a forth algorithm using generate-id(). It took MSXML4
233157 milliseconds on the 1000 nodes document and I had to kill it after 30
minutes with the 2000 nodes document. I did not attempt to test this with
the rest of the XSLT processors.

Here are the results:


Method                    MSXML4       Saxon               JD
XalanJ            XalanC
=============================================================
Recursive:
    1000 nodes                     37     Stack Ovfl.        150
Stack Ovfl.               170
    2000 nodes                   519     Stack Ovfl.   Stack Ovfl.   Stack
Ovfl.               621

Keys:
    1000 nodes                     44            1783           791
4677              1472
    2000 nodes                 2211            6890         2654
17466              5818

Non-Recursive:
    1000 nodes                    5.9                50           100
1152                   10
    2000 nodes                12.09              101           120
1342                   40



The non-recursive algorithm exhibits linear behaviour with MSXML4 and Saxon
and sub-linear! one with JD, XalanJ and XalanC.

The recursive and key-based algorithm exhibit quadratic behaviour.

XalanJ is clearly not in the same company as the rest of the tested
processors.


I could try still speeding up the non-recursive algorithm, by using a faster
search than linear to find the position of a record node in the string with
positions -- this will require that all positions must have the same (some
maximum) length. Or I could record the positions in a node-set, for which
binary search is straight-forward.

In case you are still not satisfied with the speed of the non-recursive
algorithm, just let me know :o)

Hope this helped.


=====
Cheers,

Dimitre Novatchev.
http://fxsl.sourceforge.net/ -- the home of FXSL









 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread