Re: [xsl] Using sibling value in streaming mode

Subject: Re: [xsl] Using sibling value in streaming mode
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 31 Aug 2019 08:25:36 -0000
I think Martin has provided several options quite well, but perhaps another
angle will also be helpful.

If the maps are reasonably small, then the simplest approach is "burst-mode"
or "windowed" streaming: In the template rule with match="map", bind a
variable to select="copy-of(.)", and then process the tree contained in that
variable in normal unstreamed fashion.

If you want to achieve some level of streaming within the map, then clearly
it's not going to be perfect streaming; in the worst case, if the "id" comes
last, then you're going to have to buffer something in memory. Burst-mode
streaming buffers the input in memory; an alternative is to buffer the output,
which you can achieve using xsl:fork:

<xsl:template match="map" mode="streamed">
   <xsl:fork>
     <xsl:sequence>
        <id>{string[@key='id']}</id>
     <xsl:sequence>
     <xsl:sequence>
        <xsl:apply-templates select="string[not(@key='id')]"
mode="streamed"/>
     <xsl:sequence>
   </xsl:fork>
</xsl:template>

If the maps are too large for that to be viable, then you could go for a
two-pass solution, In the first streamed pass over the input document,
construct an in-memory XDM map from position to id. In the second streamed
pass, as each <map> element is encountered, output the id obtained from this
XDM map, and then process all the children of the map (skipping the id) in
streamed mode.

Another possibility that occurred to me is a self-merge. Use xsl:merge to
merge the file with itself, using the <map> element's position() as the merge
key (if that's possible); then extract the id from one of the merge inputs,
and the other values from the other. But that still requires memory
proportional to the largest map, because Saxon is going to hold the merge
groups in memory (the semantics require an implicit call on snapshot()).

Michael Kay
Saxonica

> On 30 Aug 2019, at 22:18, Martynas JuseviD
ius martynas@xxxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> I've started looking into streaming recently (using Saxon 9.9). I have
> a use case like this:
>
> Input:
>
> <array>
>    <map>
>       <string key="key1">value1</string>
>       ...
>       <string key="id">123456789</string>
>       ...
>       <string key="keyN">valueN</string>
>    </map>
>    ...
> </array>
>
> Required output:
>
> <items>
>    <item>
>       <id>123456789</id>
>       <key>key1<key>
>       <val>value1</val>
>    </item>
>    ...
>    <item>
>       <id>123456789</id>
>       <key>id<key>
>       <val>123456789</val>
>    </item>
>    ...
>    <item>
>       <id>123456789</id>
>       <key>keyN<key>
>       <val>valueN</val>
>    </item>
>    ...
> </items>
>
> The value of <string key="id"> is used as <id> in <item> elements. The
> problem is that <string key="id"> can occur in any position in the
> <map>.
>
> I've tried using an accumulator such as
>
> <xsl:accumulator name="map-id" initial-value="()" streamable="yes"
> as="xs:string?">
>   <xsl:accumulator-rule match="/array/map/string[@key = 'id']/text()"
> select="string(.)"/>
> </xsl:accumulator>
>
> and then
>
> <item>
>    <id><xsl:value-of select="accumulator-before('map-id')"/></id>
>    ...
> </item>
>
> That worked partially -- only for sibling <string> elements that
> followed the <string key="id">. Which is not surprising.
>
> I've also tried accumulator-after('map-id') but got:
>
>  XTSE3430: Template rule is not streamable
>  * A call to accumulator-after() is consuming when there are no
> preceding consuming instructions
>
> Is it possible to have a streaming solution in this case?
>
> Martynas

Current Thread