Re: [xsl] Using sibling value in streaming mode

Subject: Re: [xsl] Using sibling value in streaming mode
From: "Martynas Jusevičius martynas@xxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 31 Aug 2019 10:11:57 -0000
Thanks a lot for your suggestions Martin and Michael, very helpful to
understand how streaming works.

The maps are not that large, they are JSON objects coming from an API.

I see that reusing IDs should be possible but incurs some complexity
nonetheless. Which made me question, do I really need to reuse the IDs
from the input or is it sufficient to produce any kind of ID which is
stable for a given <map>.

I have tried the following

    <xsl:accumulator name="map-id" initial-value="()" streamable="yes"
as="xs:string?">
       <xsl:accumulator-rule match="/array/map" select="uuid:randomUUID()"/>
    </xsl:accumulator>

and this might just work for my purposes. I considered generate-id(.)
as an option, but I need globally unique IDs rather than
document-scoped.

On Sat, Aug 31, 2019 at 10:25 AM Michael Kay mike@xxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> I think Martin has provided several options quite well, but perhaps another
angle will also be helpful.
>
> If the maps are reasonably small, then the simplest approach is "burst-mode"
or "windowed" streaming: In the template rule with match="map", bind a
variable to select="copy-of(.)", and then process the tree contained in that
variable in normal unstreamed fashion.
>
> If you want to achieve some level of streaming within the map, then clearly
it's not going to be perfect streaming; in the worst case, if the "id" comes
last, then you're going to have to buffer something in memory. Burst-mode
streaming buffers the input in memory; an alternative is to buffer the output,
which you can achieve using xsl:fork:
>
> <xsl:template match="map" mode="streamed">
>    <xsl:fork>
>      <xsl:sequence>
>         <id>{string[@key='id']}</id>
>      <xsl:sequence>
>      <xsl:sequence>
>         <xsl:apply-templates select="string[not(@key='id')]"
mode="streamed"/>
>      <xsl:sequence>
>    </xsl:fork>
> </xsl:template>
>
> If the maps are too large for that to be viable, then you could go for a
two-pass solution, In the first streamed pass over the input document,
construct an in-memory XDM map from position to id. In the second streamed
pass, as each <map> element is encountered, output the id obtained from this
XDM map, and then process all the children of the map (skipping the id) in
streamed mode.
>
> Another possibility that occurred to me is a self-merge. Use xsl:merge to
merge the file with itself, using the <map> element's position() as the merge
key (if that's possible); then extract the id from one of the merge inputs,
and the other values from the other. But that still requires memory
proportional to the largest map, because Saxon is going to hold the merge
groups in memory (the semantics require an implicit call on snapshot()).
>
> Michael Kay
> Saxonica
>
> On 30 Aug 2019, at 22:18, Martynas JuseviD
ius martynas@xxxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> I've started looking into streaming recently (using Saxon 9.9). I have
> a use case like this:
>
> Input:
>
> <array>
>    <map>
>       <string key="key1">value1</string>
>       ...
>       <string key="id">123456789</string>
>       ...
>       <string key="keyN">valueN</string>
>    </map>
>    ...
> </array>
>
> Required output:
>
> <items>
>    <item>
>       <id>123456789</id>
>       <key>key1<key>
>       <val>value1</val>
>    </item>
>    ...
>    <item>
>       <id>123456789</id>
>       <key>id<key>
>       <val>123456789</val>
>    </item>
>    ...
>    <item>
>       <id>123456789</id>
>       <key>keyN<key>
>       <val>valueN</val>
>    </item>
>    ...
> </items>
>
> The value of <string key="id"> is used as <id> in <item> elements. The
> problem is that <string key="id"> can occur in any position in the
> <map>.
>
> I've tried using an accumulator such as
>
> <xsl:accumulator name="map-id" initial-value="()" streamable="yes"
> as="xs:string?">
>   <xsl:accumulator-rule match="/array/map/string[@key = 'id']/text()"
> select="string(.)"/>
> </xsl:accumulator>
>
> and then
>
> <item>
>    <id><xsl:value-of select="accumulator-before('map-id')"/></id>
>    ...
> </item>
>
> That worked partially -- only for sibling <string> elements that
> followed the <string key="id">. Which is not surprising.
>
> I've also tried accumulator-after('map-id') but got:
>
>  XTSE3430: Template rule is not streamable
>  * A call to accumulator-after() is consuming when there are no
> preceding consuming instructions
>
> Is it possible to have a streaming solution in this case?
>
> Martynas
>
>
> XSL-List info and archive
> EasyUnsubscribe (by email)

Current Thread