Re: [xsl] Processing based on number - alternatives to recursion?

Subject: Re: [xsl] Processing based on number - alternatives to recursion?
From: "Dimitre Novatchev" <dnovatchev@xxxxxxxxx>
Date: Tue, 4 Mar 2008 21:39:22 -0800
> > Lots of stack space is what I thought. Does tail recursion mean there
> > is nothing left to process in the current round when the template
> > recursively invokes itself?
>
> more or less. googling for that phrase will yield arbitrary amounts of
> detail:-) Basically there are theorems that tell you that certain
> classes of problem can be written in tail recursive style (and certain
> classes can not) and that those that are tail recursive can essentially
> be implemented as a loop. this means that compilers can "compete" in how
> good a job they can do at optimising tail recursive calls by rewriting
> functions in tail recursive style even if the end user has not coded it
> that way explictly.


Not all XSLT processors recognize and optimize tail-recursion.
Moreover, sometimes writing in the tail-recursive style may require
quite a bit of additional effort.

A technique, which doesn't rely on support from individual XSLT
processors, and which does not eliminate recursion entirely, but
allows for a much more shallow recursion is called the Divide and
Conquer recursion. In its simplest form it requires that:

  0. A set of one or two items is processed without recursion,
according to the rules of the problem being solved.

For sets of more than 2 items:

  1. All items to be processed are divided into two (almost equal in
size) sets Set1 and Set2.

  2. Then Set1 and Set2 are independently processed. This is the
recursion step, but notice that each recursion step is performed on
half as less items than the size of the current set of items.

  3. The results of processing Set1 and Set2 are combined, if needed.


As can be easily seen, when processing a set of N items the maximum
recursion depth is Log2(N).

This means, that to process a set of 1000000 (1M) items a maximum
stack depth of only 19 will be needed.


DVC is best illustrated with a simple example. Let's find the maximum
of a sequence of integers:

This stylesheet:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
 xmlns:xs="http://www.w3.org/2001/XMLSchema";
 xmlns:f="http://fxsl.sf.net/";
 exclude-result-prefixes="f xs"
 >

 <xsl:output method="text"/>

 <xsl:template match="/">
   <xsl:sequence select="f:dvcMax(1 to 100000)"/>
 </xsl:template>

 <xsl:function name="f:dvcMax" as="xs:integer">
   <xsl:param name="pSeq" as="xs:integer+"/>

   <xsl:sequence select=
    "for $vSize in count($pSeq)
       return
         if($vSize eq 1) then $pSeq[1]
         else
           for $vhalfSize in $vSize idiv 2,
               $vMax1 in f:dvcMax(subsequence($pSeq,1,$vhalfSize)),
               $vMax2 in f:dvcMax(subsequence($pSeq,$vhalfSize+1))
              return
                 if($vMax1 gt $vMax2) then $vMax1
                    else $vMax2
    "
    />
 </xsl:function>


</xsl:stylesheet>

when applied on any xml document (not needed and ignored),

produces the wanted result:

100000

in 735 milliseconds.

Notice that the recursive processing is specified just in 6 lines.

Also notice what a neat parallel processing oportunity presents the
split into two sets and their processing:

               $vMax1 in f:dvcMax(subsequence($pSeq,1,$vhalfSize)),
               $vMax2 in f:dvcMax(subsequence($pSeq,$vhalfSize+1))


These facts show that in many cases using DVC recursion might be more
efficient than (simple) tail recursion.

Another example, which demonstrates calculating integer power in a
very fast way (the two sets in this case are identical and we process
only one of them!):

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
 xmlns:xs="http://www.w3.org/2001/XMLSchema";
 xmlns:f="http://fxsl.sf.net/";
 exclude-result-prefixes="f xs"
 >
 <xsl:output method="text"/>

 <xsl:template match="/">
   <xsl:sequence select="f:nPow(1.000001, 1000000)"/>
 </xsl:template>

 <xsl:function name="f:nPow" as="xs:double">
   <xsl:param name="pBase" as="xs:double"/>
   <xsl:param name="pPower" as="xs:integer"/>

   <xsl:sequence select=
    "if($pPower eq 0) then 1
      else if($pPower ge 1)
      then
        for $vhalfPower in $pPower idiv 2
            return
			        if($pPower mod 2 eq 0)
			              then
			                for $vresPower1 in f:nPow($pBase, $vhalfPower)
			                    return $vresPower1 * $vresPower1
			              else
			                $pBase * f:nPow($pBase, $pPower -1)
      else
         error()
    "
    />
 </xsl:function>

</xsl:stylesheet>

When this transformation is applied (again on any xml document, which
will be ignored), the result:

2.7182804691319364

is obtained in just anywhere between 0 and 16 milliseconds. (of the
three timings one was 0 and the other 15 milliseconds).


-- 
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play

On Tue, Mar 4, 2008 at 9:19 AM, David Carlisle <davidc@xxxxxxxxx> wrote:
>
>
>
> >  Supplying the stylesheet file name
> > as a parameter to the stylesheet and calling document() on it?
>
>  <xsl:variable name="here" select="."/>
> <xsl:for-each select="(document('')//*)[position()&lt; 10]">
>  <xsl:variable name="p" select="position()"/>
>       code using $here and $p as needed...
>
> note this looks short but can easily be less efficient than the
> recursive template version as it implies loading the stylesheet and
> parsing the entire thing just to get a bunch of nodes to iterate over.
> (For various reasons the system will probably need to reparse the
> stylesheet as an input doc, not use the in-memory tree resulting from
> the initial parse of the stylesheet). If on the other hand you know your
> input doc will be big enough, you already have those nodes to hand so
> <xsl:for-each select="(//*)[position()&lt; 10]">
> is probably fairly cheap.
>
>
>
> > Lots of stack space is what I thought. Does tail recursion mean there
> > is nothing left to process in the current round when the template
> > recursively invokes itself?
>
> more or less. googling for that phrase will yield arbitrary amounts of
> detail:-) Basically there are theorems that tell you that certain
> classes of problem can be written in tail recursive style (and certain
> classes can not) and that those that are tail recursive can essentially
> be implemented as a loop. this means that compilers can "compete" in how
> good a job they can do at optimising tail recursive calls by rewriting
> functions in tail recursive style even if the end user has not coded it
> that way explictly. The factorial function is (always;) the standard
> example.
>
> 5! is 5*4*3*2*1
>
> if you code it as
>
> fac(n) = n * fac(n-1) if n > 1 else 1
>
> then that is not tail recursive and the  fac(5) ends up with a stack of
> all 5 function calls being partially evaluated until you finally get down
> to fac(1) and you pop the stack (or run out of stack:-)
>
> if instead you do
>
> fac(n)= fac(n,1)
> fac(n,r)=fac(n-1,n*r) if n>1 else r
>
> then the intermediate results are built up in the parameter list and so
> you don't need to preserve the function call stack you can just
> reuse the same space at each iteration (in otherwords it's essentially a
> loop).
>
>
>
>
> > Does the call have to be placed at the end of the recursive template
> > in order for this to happen?
>
> yes for some definition of "end" for example the recursive call inside a
> conditional block ought to count as "end" (you normally need that to
> terminate recursion). so in xslt the recursive call might not actually be
> the last line, there may be various /xsl:when/xsl:choose etc.
>
>
> David
>
> ________________________________________________________________________
> The Numerical Algorithms Group Ltd is a company registered in England
> and Wales with company number 1249803. The registered office is:
> Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.
>
> This e-mail has been scanned for all viruses by Star. The service is
> powered by MessageLabs.
> ________________________________________________________________________

Current Thread