Re: [xsl] Question on streaming and grouping with nested keys

Subject: Re: [xsl] Question on streaming and grouping with nested keys
From: "Felix Sasaki felix@xxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 14 Jul 2017 13:02:18 -0000
2017-07-14 14:41 GMT+02:00 Martin Honnen martin.honnen@xxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>:

> On 14.07.2017 14:05, Felix Sasaki felix@xxxxxxxxxxxxxx wrote:
>
> I tried the example from Martin with
>>
>> <xsl:template match="TRANSACTION-LIST">
>>       <xsl:copy>
>>          <xsl:for-each-group select="copy-of(TRANSACTION)"
>> group-by="ITEM2/SUBITEM2/GROUPING-KEY">
>>             <xsl:copy>
>>                <item1-sum><xsl:value-of select="sum(current-group()/IT
>> EM2/SUBITEM2.1)"/></item1-count>
>>
>> ...
>>
>> It gives me an of memory error. The input file is 160MB, but the
>> individual transactions are rather small (around 20+ elements). The error
>> also appears if I remove "<xsl:copy>".
>>
>
> 160 MB doesn't sound like a file you need streaming for at all. Does that
> suggestion above cause memory problems only when using streaming (e.g. when
> you have <xsl:mode streamable="yes"/>) or also without streaming?



Without streaming it works.



> Have you tried increasing the memory for Saxon/Java?
>


No.


>
> As you mention Saxon EE, let's hope Michael Kay comes across this thread
> and can certainly tell you more on how to tackle that problem with his
> product.
>
> I have a working solution using an accumulator and maps, see below, but
>> here I did not manage to use streaming. If I set the accumulator to
>>  streamable="yes", Saxon EE tells me
>>
>>
>> "The xsl:accumulator-rule/@select expression for a streaming accumulator
>> must be motionless"
>>
>>
>> Although I am using xsl-copy() as in Martin's example.
>>
>>
>>   <xsl:accumulator name="gather-values" as="map(xs:anyAtomicType,
>> node())" initial-value="map{}">
>>      <xsl:accumulator-rule match="TRANSACTION">
>>        <xsl:variable name="current" select="copy-of()"/>
>>
>
> As far as I understand it, you can't use copy-of() in an accumulator you
> want to be streamable. Working with streaming and accumulating values
> requires a change of the usual coding habits with XSLT, I think, for
> instance to capture the key you have with an accumulator and streaming you
> would need to use e.g.
>      <xsl:accumulator-rule match="TRANSACTION/ITEM2/SUBITEM2.2/GROUPING-KEY/text()"
> select="string()"/>
> as only on the text node you are able to read out that value while
> streaming through the document.
>
> So to try to solve that problem with accumulators and streaming I think
> you need several of them, one counting ITEM1, one summing up
> SUBITEM2.1/text(), the above for the key and then you need to combine them
> to store the data together.
>


Thanks. Working without accumulators is fine, just trying to understand the
issue. Other input files are a bit bigger, up to 1.5 GB, so having a
streaming solution would be nice but it's not mandatory.

- Felix

Current Thread