RE: [xsl] Another tokenize() question

Subject: RE: [xsl] Another tokenize() question
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Tue, 10 Aug 2004 19:08:00 +0100
> Ok.  This *basically* works, but with a line like:
> 
> <l>Why ha<supplied>l</supplied>dest &thorn;u were agaynes me</l>
> 
> it turns it into:
> 
> <l><w>Why</w> <w>ha</w><supplied>l</supplied><w>dest</w> 
> <w>&thorn;u</w>
> <w>were</w> <w>agaynes</w> <w>me</w></l>
> 
> or if I change it to l//text()
> 
> <l><w>Why</w> <w>ha</w><supplied><w>l</w></supplied><w>dest</w>
> <w>&thorn;u</w> <w>were</w> <w>agaynes</w> <w>me</w></l>
> 
> When really:
> 
> <l><w>Why</w> <w>ha<supplied>l</supplied>dest</w> <w>&thorn;u</w>
> <w>were</w> <w>agaynes</w> <w>me</w></l>
> 
> is what is wanted.

Presumably you have confidence that if an element starts in the middle of a
word, then it ends within the same word? Otherwise you have an interleaving
problem.

You could start by replacing all the spaces with <sp/> elements, and then
process the structure along the lines:

<xsl:template match="*">
<xsl:for-each-group select="child::node()" group-starting-with="sp">
  <xsl:choose>
    <xsl:when test="self::sp">
      <w><xsl:apply-templates select="current-group() except ."/></w>
    </
    <xsl:otherwise>
      <xsl:apply-templates select="current-group()"/>
    </
  </
</xsl:for-each-group>
</xsl:template>

Michael Kay

Current Thread