Re: [xsl] Performance results of an XML-based neural network versus a map-based neural network

Subject: Re: [xsl] Performance results of an XML-based neural network versus a map-based neural network
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 2 Aug 2020 07:26:35 -0000
I'm hoping to find time to investigate this (though it's not top of my
priority list: next week I've got a lot of croquet planned...). I've opened a
tracker at

https://saxonica.plan.io/issues/4663

and if you're interested in the investigation or its conclusions, please
follow it there. There's a five-pointed star at the top right, which is the
cryptic icon that you need to click to "watch" (be informed of progress on)
the issue - you'll probably need to register, which sadly involves asking some
silly questions designed to deter robots. (Unlike Balisage, though, they don't
require familiarity with the appearance and nomenclature of American street
furniture).

Michael Kay
Saxonica



> On 22 Jul 2020, at 17:19, Michael Kay mike@xxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>
>>
>> There are lots of matrix operations involved in running a neural network --
lots of matrix addition, transpose, dot product, etc., operations. The
operations update the <Matrix> elements in <neuralNetwork>.
>>
>> I used the SAXON profile tool to see the performance of my implementation.
The performance of the matrix operations was very slow. Here's the performance
of two of the matrix operations:
>>
>> matrix:addition
>> average time (net/ms) = 273.170
>> total time (net/ms) = 27,043.813
>>
>> matrix:dot-product
>> average time (net/ms) = 257.718
>> total time (net/ms) = 25,514.069
>>
>> [Michael Kay: what does "net/ms" mean?
>
> It means the net time spent in a particular routine, not counting the time
spent in the subroutines that it calls, measured in milliseconds.
>>
>> In my second implementation I converted the first implementation to be
map-based. I replaced the above XML with this map:
>>
>> <xsl:map>
>>    <!-- Set number of nodes in each input, hidden, output layer -->
>>    <xsl:map-entry key="'inodes'" select="784"/>
>>    <xsl:map-entry key="'hnodes'" select="100"/>
>>    <xsl:map-entry key="'onodes'" select="10"/>
>>    <!-- Learning rate -->
>>    <xsl:map-entry key="'lr'" select="0.3"/>
>>    <!-- weights between the input layer and the hidden layer (wih) -->
>>    <xsl:map-entry key="'wih'" select="(-0.015882097402764903,
0.04906187053472448, -0.025639452565869168, ...)"/>
>>    <!-- weights between the hidden layer and the output layer (who) -->
>>    <xsl:map-entry key="'who'" select="(-0.029846534548482826,
0.09713823372280408, -0.07405568240941922, ...) "/>
>> </xsl:map>
>>
>> I again used the SAXON profile tool to see the performance. The performance
of the matrix operations for this implementation was astoundingly fast. Here's
the performance of two of the matrix operations:
>>
>> matrix:addition
>> average time (net/ms) = 0.003
>> total time (net/ms) = 0.254
>>
>> matrix:dot-product
>> average time (net/ms) = 0.001
>> total time (net/ms) = 0.131
>>
>> For all the matrix operations the map-based version was millions of times
faster than the XML-based version.
>
> Good to know, but without seeing the detail of the operations it's
impossible to provide explanations. The key point is probably that updating
maps is much faster than updating XML trees, because updating XML trees
requires all the unchanged subtrees to be copied.
>>
>> Surprisingly, however, the overall time to train the XML-based neural
network was faster than the time to train the map-based neural network:
>>
>> neural-network:train
>>
>> XML-based:
>> average time (net/ms) = 1711.572
>> total time (net/ms) = 169,445.644
>>
>> map-based:
>> average time (net/ms) = 3633.811
>> total time (net/ms) = 359,747.295
>>
>> I don't understand how this could possibly happen.
>
> With performance, not understanding the numbers is the normal state of
affairs.
>
> The problem with this kind of data is that it might be illustrating a
general principle that applies to a wide range of workloads, or it might be
some highly peculiar quirk of a particular construct that you used (I always
tell people that the devil is in the detail). It's impossible to know without
drilling down.
>
> Michael Kay
> Saxonica
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/293509> (by
email <>)

Current Thread