Subject: Re: [xsl] A superefficient way to compute the sum of A[i] * B[i] for i=1 to n? From: "Michael Kay mike@xxxxxxxxxxxx" <xsllistservice@xxxxxxxxxxxxxxxxxxxxxx> Date: Wed, 13 May 2020 08:48:31 0000 
The cost of creating a thread (and handling its completion) here is far greater than the cost of the computation you are doing inside the thread (a single multiplication) so this result is no surprise. Multithreading only pays off when the amount of work done in each thread is quite substantial (which is why it's a feature that's only used if explicitly requested). If you really want to get a speedup from multithreading here, then it might be worth trying something along the lines <xsl:variable name="batchSize" select="count($v1) idiv $THREADS"/> <xsl:variable name="subtotals"> <xsl:foreach select="1 to $THREADS" saxon:threads="{$THREADS}"> <xsl:sequence select="sum(((.  1) * $batchSize) + 1) to (. * $batchSize)) ! $v1[current()] * $v2[current()]"/> </xsl:foreach </xsl:variable> <xsl:sequence select="sum($subtotals)"/> (But there's probably a couple of offbyone bugs there...) To examine the difference between PE and EE we would need to study the detail (with performance, the devil is always in the detail). It may be due to bytecode generation; although hotspot bytecode generation was introduced in 9.8, it's still sometimes the case that the cost of generating bytecode exceeds the benefit. The TB command line option helps when studying this effect. Michael Kay Saxonica > On 13 May 2020, at 04:28, Dimitre Novatchev dnovatchev@xxxxxxxxx <xsllistservice@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > I played a little bit using saxon:threads: https://www.saxonica.com/html/documentation/extensions/attributes/threads.html <https://www.saxonica.com/html/documentation/extensions/attributes/threads.ht ml> , with the hope that this could lead to faster execution. The results are below, and they can be useful as a reference. > With a source XML document of 1M elements, each of which has a single text node with a numeric value, on my PC (6 cores, Intel i78700 CPU @3.20GHz, 32.0GB RAM, x64based processor and 64bit Operating System (Windows 10)) I got these results: > > Saxon Ver./Edition > saxon:threads > Time in seconds > Remarks > SaxonEE 9.8.0.12 > 1 > 1.8 > > SaxonEE 9.8.0.12 > 2 > 15.7 > ??????????????? >  b b  > 4 > 5.7 > >  b b  > 6 > 4.3 > >  b b  > 8 > 3.7 > >  b b  > 12 > 3.0 > >  b b  > 16 > 2.9 > >  b b  > 20 > 2.7 > >  b b  > 24 > 2.7 > >  b b  > 28 > 2.6 > >  b b  > 32 > 2.6 > > SaxonPE 9.8.0.12 > N/A > 1.0 > ??????????????? > > Notice that the best result was reached when running SaxonPE, in which the saxon:threads attribute is ignored and no threading takes place. > > > > Also note that the best result with threading is when the number of threads is just 1. > > > > Here is a fragment of the source XML document, just to give you an idea of the input: > > <vectors> > <seq1> > <n>0.1</n> > <n>0.2</n> > <n>0.3</n> > <n>0.4</n> > <n>0.5</n> > <n>0.6</n> > <n>0.7</n> > <n>0.8</n> > <n>0.9</n> > <n>1</n> > <n>0.1</n> > <n>0.2</n> > <n>0.3</n> > <n>0.4</n> > <n>0.5</n> > <n>0.6</n> > <n>0.7</n> > <n>0.8</n> > <n>0.9</n> > <n>1</n> > <n>0.1</n> > > . . . . . . . . > > The <seq1> element has 1 000 000 (1M) child elements named "n". > > > > Then it has an immediate sibling element <seq2> with contents that was copied from <seq1>. > > > > Here is the transformation that uses saxon:threads: > > <xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform <http://www.w3.org/1999/XSL/Transform>" > xmlns:xs="http://www.w3.org/2001/XMLSchema <http://www.w3.org/2001/XMLSchema>" xmlns:saxon="http://saxon.sf.net/ <http://saxon.sf.net/>"> > <xsl:output method="text"/> > <xsl:variable name="v1" select="/*/seq1/*/text()/number()"/> > <xsl:variable name="v2" select="/*/seq2/*/text()/number()"/> > <xsl:variable name="vLength" select="1000000"/> > > <xsl:template match="/"> > <xsl:variable name="vProds" as="xs:double*"> > <xsl:foreach select="1 to $vLength" saxon:threads="6"> > <xsl:sequence select="$v1[current()] * $v2[current()]"/> > </xsl:foreach> > </xsl:variable> > > <xsl:valueof select="sum($vProds)"/> > </xsl:template> > </xsl:stylesheet> > > > > > > Cheers, > > Dimitre > > > On Sun, May 10, 2020 at 11:53 AM Dimitre Novatchev <dnovatchev@xxxxxxxxx <mailto:dnovatchev@xxxxxxxxx>> wrote: > For reference on fast sequential algorithms for computing the dot product, see this: > > https://lemire.me/blog/2018/07/05/howquicklycanyoucomputethedotproduct betweentwolargevectors/ <https://lemire.me/blog/2018/07/05/howquicklycanyoucomputethedotproduc tbetweentwolargevectors/> > > On Sat, May 9, 2020 at 10:52 AM Dimitre Novatchev <dnovatchev@xxxxxxxxx <mailto:dnovatchev@xxxxxxxxx>> wrote: > Hi Roger, > > > I need a superefficient way to compute the sum of A[i] * B[i] for i=1 to n. > > In case you have n cores / processors, and the compiler knows to optimize functions like map(), zipWith() (foreachpair() in XPath 3.0), then all processors can each do a corresponding multiplication in parallel in a single moment (computing cycle). > > Then what remains is the summing of the results of these multiplications. This essentially is the mapreduce technique. > > Interestingly enough, while not all reduce operations can be parallelized, in the case of sum() one can add together n numbers in Log2(n) summing steps  very similar to the DVC (DiVide and Conquer) technique, but doing it parallel  not sequentially. So, first there would be n/2 additions, then n/4 additions (of the results of the first step), then n/8 additions, and so on  in a total of ceiling(Log2(n)) steps. If for each steps we have sufficient number of processors (n/2 for the first step, less for the following steps), then the summing could be performed in Log2(n) computing cycles. > > So, it seems that the whole computation could be performed in something like ~ ceiling(Log2(n)) + 1 computing cycles. > > Or maybe I am being wrong here? :) Please, correct me. > > Cheers, > Dimitre > > > On Sat, May 9, 2020 at 4:59 AM Costello, Roger L. costello@xxxxxxxxx <mailto:costello@xxxxxxxxx> <xsllistservice@xxxxxxxxxxxxxxxxxxxxxx <mailto:xsllistservice@xxxxxxxxxxxxxxxxxxxxxx>> wrote: > Hi Folks, > > I need a superefficient way to compute the sum of A[i] * B[i] for i=1 to n. > > For example, suppose A is this: > > <row> > <col>0.9</col> > <col>0.3</col> > </row> > > and B is this: > > <row> > <col>0.2</col> > <col>0.8</col> > </row> > > I want to compute: > > (0.9 * 0.2) + (0.3 * 0.8) > > Here's one way to do it: > > sum(for $i in 1 to count($A/col) return number($A/col[$i]) * number($B/col[$i])) > > I suspect that is not the most efficient approach. > > What is the most efficient approach? I will be doing hundreds of thousands of these computations, so I want to use the most efficient approach. > > /Roger > > > > XSLList info and archive <http://www.mulberrytech.com/xsl/xsllist> > EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsllist/293509> (by email <>)
Current Thread 


< Previous  Index  Next > 

Re: [xsl] A superefficient way to , Dimitre Novatchev dn  Thread  Re: [xsl] A superefficient way to , Costello, Roger L. c 
Re: [xsl] A superefficient way to , Dimitre Novatchev dn  Date  [xsl] Anyone have XSLT that generat, Costello, Roger L. c 
Month 