Re: [xsl] Alphabetical index: unstreamable?

Subject: Re: [xsl] Alphabetical index: unstreamable?
From: "Michael Müller-Hillebrand mmh@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 2 Jun 2014 18:43:13 -0000
Abel,

Thanks a lot for the additional hints. That will help us a lot to avoid
pitfalls. Just recently I studied your transcribed talk from Prague (Thanks to
Roger C.) and learned a lot about the streaming restrictions.

Let me put it this way: This thread convinced one of our Java developers to
stop implementing a Java solution for this XSLT problem. It looks like we
rather wait for the final XSLT3 spec.

Thanks,

- Michael

PS: Unfortunately one can not visit every cool XML conference

Am 02.06.2014 um 16:47 schrieb Abel Braaksma:

>
> On 28-5-2014 19:50, Michael M|ller-Hillebrand mmh@xxxxxxxxx wrote:
>> Hi Dimitre,
>>
>> Do I understand correctly this could be as "simple" as defining an
accumulator that incrementally builds up a map? If the source contains
<indexterm> elements I could maybe do something similar to
>>
>> <xsl:accumulator name="indexterms" as="map(xs:string, element(indexterm))"
>>    initial-value="map{}">
>>    <xsl:accumulator-rule match="indexterm"
>>      new-value=" map:put($value, generate-id(), .) "/>
>> </xsl:accumulator>
>>
>> and at the end process the content of the accumulator?
>
> Yes, that is essentially how it is supposed to be done. However, there
> are a few caveats with the code snippet above:
>
> - accumulators must be motionless, they cannot consume the current node
> - you cannot store references to nodes, here you use ".", which is not
> allowed
> - childness nodes, such as text(), can be consumed, which comes in handy
> here
> - map:put was dropped, but it seems to re-emerge, see Public XSLT Spec
> Bug 24726 (https://www.w3.org/Bugs/Public/show_bug.cgi?id=24726)
>
> To create an accumulator for indexterm elements, we need to reverse the
> match pattern, so that the focus of the accumulator is on a non-element
> leaf-node (a childless node). For simplicity, let's assume DocBook
> <indexterm> like as follows:
>
> <indexterm>
>    <primary>prim</primary>
>    <secondary>beginning</secondary>
> </indexterm>
>
> Then your accumulator could look like this:
>
> <xsl:accumulator name="indexterms"
>    as="map(xs:string, xs:string+)"
>    initial-value="map{}">
>    <xsl:accumulator-rule
>        match="text()[parent::primary |
> parent::secundary][ancestor::indexterm]"
>        new-value="map:put(
>            $value,
>            generate-id(ancestor::indexterm),
>            ($indexterms(generate-id(ancestor::indexterm)), string(.)))" />
>
> </xsl:accumulator>
>
> This matches on the text-node, and consuming the text-node is allowed
> (it will always be childless). The fn:string(.) is still required (or
> use fn:data, of fn:copy-of), because even though it is a childless node,
> you cannot store its reference in a map.
>
> The accumulator above will create a sequence of terms mapped to the
> indexterm-element, where the first term will be the <primary> element's
> content and the second in the sequence will be the <secundary>, if any.
>
> The expression inside the new-value attribute can quickly become
> unmanageable, but you can write a stylesheet function to write it
> declaratively.
>
> Note that you must be careful with fn:generate-id in a streaming
> scenario. With streaming it is likely you will have places where you use
> fn:copy-of or fn:snapshot. The id's of these nodes will be different
> from the ones on the streamed nodes of the input stream.
>
> Note also that this won't help you if you want to place the resulting
> index prior to the nodes to be processed, i.e. a TOC at the beginning of
> a document cannot be created this way.
>
> If you plan to attend the XML London 2014 conference this weekend, my
> talk will be about Streaming Design Patterns, common programming
> scenarios encountered in XSLT 2.0 and how to write them in a streamable
> way. From easy (such as matching patterns that depend on the child
> axis), to intermediate (such as working out following-sibling scenarios)
> to advanced (such as a streamable way to do sorting in a maximum of two
> passes).
>
> Cheers,
> Abel

[demime 1.01d removed an attachment of type application/pgp-signature which had a name of signature.asc]

Current Thread