[xsl] Re: Statistics - Calculating Standard Deviation

Subject: [xsl] Re: Statistics - Calculating Standard Deviation
From: "Dimitre Novatchev" <dnovatchev@xxxxxxxxx>
Date: Fri, 13 Jun 2003 19:17:33 +0200
"Andrew Welch" <AWelch@xxxxxxxxxxxxxxx> wrote in message
news:3BAAB77DB787FC4C961601D815DAF1E50E6C41@xxxxxxxxxxxxxxxxxxxxxxxx
> > The performance is the thing that is worrying me most.  Ideally the
> > target processor is MSXML 4.0, but that is open to negotiation...
>
> Well using saxon 7.x (use the latest) and exslt/math you could use the
following simple stylesheet.  Im just wondering how much > of this can be
done using straight xslt 2 now... Is there a square root function? I had a
quick look but didn?t see anything.
>

The solution I posted earlier today runs OK without any modifications in
XSLT 2.0 (Saxon 7.5):

http://aspn.activestate.com/ASPN/Mail/Message/XSL-List/1670297

>
>
>
> <?xml version="1.0"?>
> <xsl:stylesheet version="1.0"
>   xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>   xmlns:exsl="http://exslt.org/math";>
>
> <xsl:variable name="mean" select="sum(/root/node) div count(/root/node)"/>
>
> <xsl:variable name="diffs">
>   <root>
>     <xsl:for-each select="/root/node">
>       <node squaredDiff="{exsl:power($mean - .,2)}">


Why is this necessary? Probably multiplying a number with itself in pure
XSLT will not be slower?


>          <xsl:copy-of select="."/>
>       </node>
>     </xsl:for-each>
>   </root>
> </xsl:variable>
>
> <xsl:variable name="mean.Of.Sum.Of.Diffs">
>   <xsl:for-each select="$diffs">
>     <xsl:value-of select="sum(/root/node/@squaredDiff) div (count
(/root/node)-1)"/>
>   </xsl:for-each>
> </xsl:variable>
>
> <xsl:template match="/">
>   standard deviation: <xsl:value-of
select="exsl:sqrt(number($mean.Of.Sum.Of.Diffs))"/>
> </xsl:template>
>
> </xsl:stylesheet>


This solution will use 2 * N units of memory, which may be limiting its
applicability especially when processing long node-sets.
It may require from three to five traversals of a node-set with the length N
of the initial node-set (one each for sum() and count())

An advantage (in efficiency) is that it does not require any recursion.

However, I guess it would be much more efficient if sequences were
used/built instead of node-sets.


=====
Cheers,

Dimitre Novatchev.
http://fxsl.sourceforge.net/ -- the home of FXSL




 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread