Re: [xsl] mixed content grouping by whitespace

Subject: Re: [xsl] mixed content grouping by whitespace
From: James Cummings <james@xxxxxxxxxxxxxxxxx>
Date: Mon, 12 Apr 2010 10:37:17 +0100
On Sun, Apr 11, 2010 at 20:17, Imsieke, Gerrit, le-tex
<gerrit.imsieke@xxxxxxxxx> wrote:
> I applied a two-step process:
> 1. Mark up whitespace using intermediate <seg @type="sep"> </seg>;
> 2. group adjacent WS (and non-WS) nodes, put the non-WS groups in a newly
> created w element.

Two solutions from Gerrit and Ken... but I've got some questions to
help my understanding...

> B <xsl:template match="tei:seg" >
> B  B <xsl:copy>

This is taking place in an xsl:copy to copy the surrounding tei:seg
element, right?

> B  B  B <xsl:variable name="sep">
> B  B  B  B <xsl:apply-templates mode="sep" />
> B  B  B </xsl:variable>

This is the first pass, it goes and creates the whitespace
seg/@type='sep' with a matching string and just puts out the text
content with a non-matching string. Elements being copied with a
copy-all template

> B  B  B <xsl:for-each-group select="$sep/node()"
> B  B  B  B group-adjacent="boolean(self::tei:seg[@type='sep'])">

This groups the nodes in the variable you've created by the boolean
(so the truth or falsehood of whether the pattern matches? I didn't
know you could do that in a group-* pattern) of the existence of the
segs you've created on tei:seg/text() which mark the whitespace.

> B  B  B  B <xsl:choose>
> B  B  B  B  B <xsl:when test="current-grouping-key()">
> B  B  B  B  B  B <xsl:value-of select="current-group()" />
> B  B  B  B  B </xsl:when>

When it is one of those whitespace segs, then just put out the value
of the whitespace, temporary element vanishes.

> B  B  B  B  B <xsl:otherwise>
> B  B  B  B  B  B <w xmlns="http://www.tei-c.org/ns/1.0";>
> B  B  B  B  B  B  B <xsl:apply-templates select="current-group()"/>
> B  B  B  B  B  B </w>
> B  B  B  B  B </xsl:otherwise>

Otherwise, wrap it in a word element.

> B <xsl:template match="tei:seg/text()" mode="sep">
> B  B <xsl:analyze-string select="." regex="\s+">
> B  B  B <xsl:matching-substring>
> B  B  B  B <seg type="sep" xmlns="http://www.tei-c.org/ns/1.0";>
> B  B  B  B  B <xsl:value-of select="."/>
> B  B  B  B </seg>
> B  B  B </xsl:matching-substring>
> B  B  B <xsl:non-matching-substring>
> B  B  B  B <xsl:value-of select="."/>
> B  B  B </xsl:non-matching-substring>
> B  B </xsl:analyze-string>
> B </xsl:template>

analyze-string on whitespace on text nodes inside tei:seg, when it is
a match wrap it in a new seg, otherwise, just put it out. This is in
mode 'sep' and is only applied inside the sep variable above.

Cool! Thanks Gerrit, I certainly learned something new!

-James

Current Thread