Re: [xsl] A super-efficient way to compute the sum of A[i] * B[i] for i=1 to n?

Subject: Re: [xsl] A super-efficient way to compute the sum of A[i] * B[i] for i=1 to n?
From: "Dimitre Novatchev dnovatchev@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 13 May 2020 03:27:45 -0000
I played a little bit using *saxon:threads*:
https://www.saxonica.com/html/documentation/extensions/attributes/threads.htm
l
,
with the hope that this could lead to faster execution. The results are
below, and they can be useful as a reference.
With a source XML document of 1M elements, each of which has a single text
node with a numeric value, on my PC (6 cores, Intel i7-8700 CPU @3.20GHz,
32.0GB RAM, x64-based processor and 64-bit Operating System (Windows 10)) I
got these results:

*Saxon Ver./Edition*

*saxon:threads*

*Time in seconds*

*Remarks*

Saxon-EE 9.8.0.12

 1

1.8



Saxon-EE 9.8.0.12

 2

15.7

???????????????

----  b -------b ----------

 4

5.7



----  b -------b ----------

 6

4.3



----  b -------b ----------

 8

3.7



----  b -------b ----------

12

3.0



----  b -------b ----------

16

2.9



----  b -------b ----------

20

2.7



----  b -------b ----------

24

2.7



----  b -------b ----------

28

2.6



----  b -------b ----------

32

2.6



Saxon-PE 9.8.0.12

N/A

1.0

???????????????



Notice that the best result was reached when running Saxon-PE, in which the
*saxon:threads* attribute is ignored and no threading takes place.


Also note that the best result with threading is when the number of threads
is just 1.


Here is a fragment of the source XML document, just to give you an idea of
the input:

<vectors>
<seq1>
 <n>0.1</n>
 <n>0.2</n>
 <n>0.3</n>
 <n>0.4</n>
 <n>0.5</n>
 <n>0.6</n>
 <n>0.7</n>
 <n>0.8</n>
 <n>0.9</n>
 <n>1</n>
 <n>0.1</n>
 <n>0.2</n>
 <n>0.3</n>
 <n>0.4</n>
 <n>0.5</n>
 <n>0.6</n>
 <n>0.7</n>
 <n>0.8</n>
 <n>0.9</n>
 <n>1</n>
 <n>0.1</n>

.  .  .  .  .  .  .  .

The <seq1> element has 1 000 000 (1M) child elements named "n".


Then it has an immediate sibling element <seq2> with contents that was
copied from <seq1>.


Here is the transformation that uses *saxon:threads*:

<xsl:stylesheet version="3.0" xmlns:xsl="
http://www.w3.org/1999/XSL/Transform";
 xmlns:xs="http://www.w3.org/2001/XMLSchema"; xmlns:saxon="
http://saxon.sf.net/";>
 <xsl:output method="text"/>
 <xsl:variable name="v1" select="/*/seq1/*/text()/number()"/>
 <xsl:variable name="v2" select="/*/seq2/*/text()/number()"/>
 <xsl:variable name="vLength" select="1000000"/>

  <xsl:template match="/">
    <xsl:variable name="vProds" as="xs:double*">
   <xsl:for-each select="1 to $vLength" saxon:threads="6">
     <xsl:sequence select="$v1[current()] * $v2[current()]"/>
   </xsl:for-each>
    </xsl:variable>

    <xsl:value-of select="sum($vProds)"/>
  </xsl:template>
</xsl:stylesheet>



Cheers,

Dimitre

On Sun, May 10, 2020 at 11:53 AM Dimitre Novatchev <dnovatchev@xxxxxxxxx>
wrote:

> For reference on fast sequential algorithms for computing the dot product,
> see this:
>
>
>
https://lemire.me/blog/2018/07/05/how-quickly-can-you-compute-the-dot-product
-between-two-large-vectors/
>
>
> On Sat, May 9, 2020 at 10:52 AM Dimitre Novatchev <dnovatchev@xxxxxxxxx>
> wrote:
>
>> Hi Roger,
>>
>>   > I need a super-efficient way to compute the sum of A[i] * B[i] for
>> i=1 to n.
>>
>> In case you have n cores / processors, and the compiler knows to optimize
>> functions like map(), zipWith() (for-each-pair() in XPath 3.0), then all
>> processors can each do a corresponding multiplication in parallel in a
>> single moment (computing cycle).
>>
>> Then what remains is the summing of the results of these multiplications.
>> This essentially is the map-reduce technique.
>>
>> Interestingly enough, while not all reduce operations can be
>> parallelized, in the case of sum() one can add together n numbers in
>> Log2(n) summing steps -- very similar to the DVC (DiVide and Conquer)
>> technique, but doing it parallel -- not sequentially. So, first there
would
>> be n/2 additions, then n/4 additions (of the results of the first step),
>> then n/8 additions, and so on -- in a total of ceiling(Log2(n)) steps. If
>> for each steps we have sufficient number of processors (n/2 for the first
>> step, less for the following steps), then the summing could be performed
in
>> Log2(n) computing cycles.
>>
>> So, it seems that the whole computation could be performed in something
>> like ~ ceiling(Log2(n)) + 1 computing cycles.
>>
>> Or maybe I am being wrong here? :)  Please, correct me.
>>
>> Cheers,
>> Dimitre
>>
>>
>> On Sat, May 9, 2020 at 4:59 AM Costello, Roger L. costello@xxxxxxxxx <
>> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>
>>> Hi Folks,
>>>
>>> I need a super-efficient way to compute the sum of A[i] * B[i] for i=1
>>> to n.
>>>
>>> For example, suppose A is this:
>>>
>>> <row>
>>>     <col>0.9</col>
>>>     <col>0.3</col>
>>> </row>
>>>
>>> and B is this:
>>>
>>> <row>
>>>     <col>0.2</col>
>>>     <col>0.8</col>
>>> </row>
>>>
>>> I want to compute:
>>>
>>> (0.9 * 0.2) + (0.3 * 0.8)
>>>
>>> Here's one way to do it:
>>>
>>> sum(for $i in 1 to count($A/col) return number($A/col[$i]) *
>>> number($B/col[$i]))
>>>
>>> I suspect that is not the most efficient approach.
>>>
>>> What is the most efficient approach? I will be doing hundreds of
>>> thousands of these computations, so I want to use the most efficient
>>> approach.
>>>
>>> /Roger

Current Thread