Re: [xsl] Question on streaming and grouping with nested keys

Subject: Re: [xsl] Question on streaming and grouping with nested keys
From: "Martin Honnen martin.honnen@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 14 Jul 2017 12:41:55 -0000
On 14.07.2017 14:05, Felix Sasaki felix@xxxxxxxxxxxxxx wrote:

I tried the example from Martin with

<xsl:template match="TRANSACTION-LIST">
<xsl:copy>
<xsl:for-each-group select="copy-of(TRANSACTION)" group-by="ITEM2/SUBITEM2/GROUPING-KEY">
<xsl:copy>
<item1-sum><xsl:value-of select="sum(current-group()/ITEM2/SUBITEM2.1)"/></item1-count>


...

It gives me an of memory error. The input file is 160MB, but the individual transactions are rather small (around 20+ elements). The error also appears if I remove "<xsl:copy>".

160 MB doesn't sound like a file you need streaming for at all. Does that suggestion above cause memory problems only when using streaming (e.g. when you have <xsl:mode streamable="yes"/>) or also without streaming? Have you tried increasing the memory for Saxon/Java?


As you mention Saxon EE, let's hope Michael Kay comes across this thread and can certainly tell you more on how to tackle that problem with his product.

I have a working solution using an accumulator and maps, see below, but here I did not manage to use streaming. If I set the accumulator to streamable="yes", Saxon EE tells me


"The xsl:accumulator-rule/@select expression for a streaming accumulator must be motionless"



Although I am using xsl-copy() as in Martin's example.



<xsl:accumulator name="gather-values" as="map(xs:anyAtomicType, node())" initial-value="map{}">
<xsl:accumulator-rule match="TRANSACTION">
<xsl:variable name="current" select="copy-of()"/>

As far as I understand it, you can't use copy-of() in an accumulator you want to be streamable. Working with streaming and accumulating values requires a change of the usual coding habits with XSLT, I think, for instance to capture the key you have with an accumulator and streaming you would need to use e.g.
<xsl:accumulator-rule match="TRANSACTION/ITEM2/SUBITEM2.2/GROUPING-KEY/text()" select="string()"/>
as only on the text node you are able to read out that value while streaming through the document.


So to try to solve that problem with accumulators and streaming I think you need several of them, one counting ITEM1, one summing up SUBITEM2.1/text(), the above for the key and then you need to combine them to store the data together.

Current Thread