Re: [xsl] Alphabetical index: unstreamable?

Subject: Re: [xsl] Alphabetical index: unstreamable?
From: "Abel Braaksma (Exselt) abel@xxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 2 Jun 2014 14:47:37 -0000
On 28-5-2014 19:50, Michael M|ller-Hillebrand mmh@xxxxxxxxx wrote:
> Hi Dimitre,
>
> Do I understand correctly this could be as "simple" as defining an accumulator that incrementally builds up a map? If the source contains <indexterm> elements I could maybe do something similar to
>
> <xsl:accumulator name="indexterms" as="map(xs:string, element(indexterm))"
>     initial-value="map{}">
>     <xsl:accumulator-rule match="indexterm" 
>       new-value=" map:put($value, generate-id(), .) "/>
>  </xsl:accumulator>
>
> and at the end process the content of the accumulator? 

Yes, that is essentially how it is supposed to be done. However, there
are a few caveats with the code snippet above:

- accumulators must be motionless, they cannot consume the current node
- you cannot store references to nodes, here you use ".", which is not
allowed
- childness nodes, such as text(), can be consumed, which comes in handy
here
- map:put was dropped, but it seems to re-emerge, see Public XSLT Spec
Bug 24726 (https://www.w3.org/Bugs/Public/show_bug.cgi?id=24726)

To create an accumulator for indexterm elements, we need to reverse the
match pattern, so that the focus of the accumulator is on a non-element
leaf-node (a childless node). For simplicity, let's assume DocBook
<indexterm> like as follows:

<indexterm>
    <primary>prim</primary>
    <secondary>beginning</secondary>
</indexterm>

Then your accumulator could look like this:

<xsl:accumulator name="indexterms"
    as="map(xs:string, xs:string+)"
    initial-value="map{}">
    <xsl:accumulator-rule
        match="text()[parent::primary |
parent::secundary][ancestor::indexterm]"
        new-value="map:put(
            $value,
            generate-id(ancestor::indexterm),
            ($indexterms(generate-id(ancestor::indexterm)), string(.)))" />

</xsl:accumulator>

This matches on the text-node, and consuming the text-node is allowed
(it will always be childless). The fn:string(.) is still required (or
use fn:data, of fn:copy-of), because even though it is a childless node,
you cannot store its reference in a map.

The accumulator above will create a sequence of terms mapped to the
indexterm-element, where the first term will be the <primary> element's
content and the second in the sequence will be the <secundary>, if any.

The expression inside the new-value attribute can quickly become
unmanageable, but you can write a stylesheet function to write it
declaratively.

Note that you must be careful with fn:generate-id in a streaming
scenario. With streaming it is likely you will have places where you use
fn:copy-of or fn:snapshot. The id's of these nodes will be different
from the ones on the streamed nodes of the input stream.

Note also that this won't help you if you want to place the resulting
index prior to the nodes to be processed, i.e. a TOC at the beginning of
a document cannot be created this way.

If you plan to attend the XML London 2014 conference this weekend, my
talk will be about Streaming Design Patterns, common programming
scenarios encountered in XSLT 2.0 and how to write them in a streamable
way. From easy (such as matching patterns that depend on the child
axis), to intermediate (such as working out following-sibling scenarios)
to advanced (such as a streamable way to do sorting in a maximum of two
passes).

Cheers,
Abel

Current Thread