Re: [xsl] Question on streaming and grouping with nested keys

Subject: Re: [xsl] Question on streaming and grouping with nested keys
From: "Martin Honnen martin.honnen@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 14 Jul 2017 14:13:23 -0000
On 14.07.2017 15:02, Felix Sasaki felix@xxxxxxxxxxxxxx wrote:


2017-07-14 14:41 GMT+02:00 Martin Honnen martin.honnen@xxxxxx <mailto:martin.honnen@xxxxxx> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx <mailto:xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>>:

    On 14.07.2017 14:05, Felix Sasaki felix@xxxxxxxxxxxxxx
    <mailto:felix@xxxxxxxxxxxxxx> wrote:

I tried the example from Martin with

        <xsl:template match="TRANSACTION-LIST">
               <xsl:copy>
                  <xsl:for-each-group select="copy-of(TRANSACTION)"
        group-by="ITEM2/SUBITEM2/GROUPING-KEY">
                     <xsl:copy>
                        <item1-sum><xsl:value-of
        select="sum(current-group()/ITEM2/SUBITEM2.1)"/></item1-count>

...

        It gives me an of memory error. The input file is 160MB, but the
        individual transactions are rather small (around 20+ elements).
        The error also appears if I remove "<xsl:copy>".


160 MB doesn't sound like a file you need streaming for at all. Does
that suggestion above cause memory problems only when using
streaming (e.g. when you have <xsl:mode streamable="yes"/>) or also
without streaming?




Without streaming it works.

That sounds odd.




Thanks. Working without accumulators is fine, just trying to understand the issue. Other input files are a bit bigger, up to 1.5 GB, so having a streaming solution would be nice but it's not mandatory.

I have now tried to solve it with streaming accumulators, using


<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
    xmlns:xs="http://www.w3.org/2001/XMLSchema";
    xmlns:math="http://www.w3.org/2005/xpath-functions/math";
    xmlns:map="http://www.w3.org/2005/xpath-functions/map";
    exclude-result-prefixes="xs math map"
    expand-text="true"
    version="3.0">

<xsl:param name="STREAMABLE" as="xs:boolean" static="yes" select="true()"/>

<xsl:mode _streamable="{$STREAMABLE}" on-no-match="shallow-skip" use-accumulators="item1-count subitem groups"/>

<xsl:output indent="yes"/>

<xsl:accumulator name="item1-count" as="xs:integer" initial-value="0" _streamable="{$STREAMABLE}">
<xsl:accumulator-rule match="TRANSACTION" select="0"/>
<xsl:accumulator-rule match="TRANSACTION/ITEM1" select="$value + 1"/>
</xsl:accumulator>


<xsl:accumulator name="subitem" as="xs:integer" initial-value="0" _streamable="{$STREAMABLE}">
<xsl:accumulator-rule match="TRANSACTION/ITEM2/SUBITEM2.1/text()" select="xs:integer(.)"/>
</xsl:accumulator>


<xsl:accumulator name="groups" as="map(xs:string, map(xs:string, xs:integer))" initial-value="map{}" _streamable="{$STREAMABLE}">
<xsl:accumulator-rule match="TRANSACTION/ITEM2/SUBITEM2.2/GROUPING-KEY/text()"
select="let $key := string(),
$count := accumulator-before('item1-count'),
$sum := accumulator-before('subitem')
return if (not(map:contains($value, $key)))
then map:put($value, $key, map { 'count' : $count, 'sum' : $sum })
else let $value-map := $value($key)
return map:put($value, $key, map { 'count' : $count + $value-map?count, 'sum' : $sum + $value-map?sum })"/>
</xsl:accumulator>


<xsl:template match="TRANSACTION-LIST">
<xsl:copy>
<xsl:apply-templates/>
<xsl:variable name="groups" select="accumulator-after('groups')"/>
<xsl:for-each select="map:keys($groups)">
<transaction key="{.}">
<count>{$groups(.)?count}</count>
<amount>{$groups(.)?sum}</amount>
</transaction>
</xsl:for-each>
</xsl:copy>
</xsl:template>


</xsl:stylesheet>

I had thought, that, when matching on a text() node, it is possible to consume its value and Saxon does not complain about the accumulator

<xsl:accumulator name="subitem" as="xs:integer" initial-value="0" _streamable="{$STREAMABLE}">
<xsl:accumulator-rule match="TRANSACTION/ITEM2/SUBITEM2.1/text()" select="xs:integer(.)"/>
</xsl:accumulator>


However, for the more complex one


<xsl:accumulator name="groups" as="map(xs:string, map(xs:string, xs:integer))" initial-value="map{}" _streamable="{$STREAMABLE}">
<xsl:accumulator-rule match="TRANSACTION/ITEM2/SUBITEM2.2/GROUPING-KEY/text()"
select="let $key := string(),
$count := accumulator-before('item1-count'),
$sum := accumulator-before('subitem')
return if (not(map:contains($value, $key)))
then map:put($value, $key, map { 'count' : $count, 'sum' : $sum })
else let $value-map := $value($key)
return map:put($value, $key, map { 'count' : $count + $value-map?count, 'sum' : $sum + $value-map?sum })"/>
</xsl:accumulator>


it continues to complain with

Static error at xsl:accumulator-rule on line 33 column 136 of count-sum-accum1.xsl:
XTSE3430: The xsl:accumulator-rule/@select expression (or contained sequence constructor)
for a streaming accumulator must be motionless


As I have no other implementation to test (the Feb 2016 build of Exselt is too old to support the XSLT 3.0 final spec syntax details) I can't tell whether Saxon is right and I am afraid I still get lost when doing streamability analysis by hand.

When I disable streaming then the code seems to give the right result on some simplified test data

<?xml version="1.0" encoding="UTF-8"?>
<TRANSACTION-LIST>
    <TRANSACTION>
        <ITEM1>1</ITEM1>
        <ITEM2>
            <SUBITEM2.1>10</SUBITEM2.1>
            <SUBITEM2.2>
                <GROUPING-KEY>a</GROUPING-KEY>
            </SUBITEM2.2>
        </ITEM2>
    </TRANSACTION>
    <TRANSACTION>
        <ITEM1>1</ITEM1>
        <ITEM2>
            <SUBITEM2.1>10</SUBITEM2.1>
            <SUBITEM2.2>
                <GROUPING-KEY>b</GROUPING-KEY>
            </SUBITEM2.2>
        </ITEM2>
    </TRANSACTION>
    <TRANSACTION>
        <ITEM1>1</ITEM1>
        <ITEM2>
            <SUBITEM2.1>10</SUBITEM2.1>
            <SUBITEM2.2>
                <GROUPING-KEY>c</GROUPING-KEY>
            </SUBITEM2.2>
        </ITEM2>
    </TRANSACTION>
    <TRANSACTION>
        <ITEM1>1</ITEM1>
        <ITEM2>
            <SUBITEM2.1>10</SUBITEM2.1>
            <SUBITEM2.2>
                <GROUPING-KEY>a</GROUPING-KEY>
            </SUBITEM2.2>
        </ITEM2>
    </TRANSACTION>
    <TRANSACTION>
        <ITEM1>1</ITEM1>
        <ITEM2>
            <SUBITEM2.1>10</SUBITEM2.1>
            <SUBITEM2.2>
                <GROUPING-KEY>b</GROUPING-KEY>
            </SUBITEM2.2>
        </ITEM2>
    </TRANSACTION>
    <TRANSACTION>
        <ITEM1>1</ITEM1>
        <ITEM2>
            <SUBITEM2.1>10</SUBITEM2.1>
            <SUBITEM2.2>
                <GROUPING-KEY>c</GROUPING-KEY>
            </SUBITEM2.2>
        </ITEM2>
    </TRANSACTION>
</TRANSACTION-LIST>

Current Thread