Re: [xsl] Another tokenize() question

Subject: Re: [xsl] Another tokenize() question
From: James Cummings <James.Cummings@xxxxxxxxxxxxxx>
Date: Tue, 10 Aug 2004 18:07:26 +0100 (BST)
On Tue, 10 Aug 2004, David Carlisle wrote:
> <xsl:template match="l">
>  <l>
>   <xsl:apply-templates/>
>  </l>
> </xsl:template>
> 
> <xsl:template match="l/text()"> <!-- or l//text() according to taste-->
>  <xsl:analyse-string regexp="\w+" select=".">
>    <xsl:matching-substring><w><xsl:value-of select="."/></w></xsl:matching-substring>
>    <xsl:non-matching-substring><xsl:value-of select="."/></xsl:non-matching-substring>
>  </xsl:analyse-string>
> </xsl:template>
> 
> Now if you do it this way you don'tjust get a list of w elements one for
> each word you get them in situ, and around them you get your non-word
> characters and anything that came from the apply-templates in the
> template for l.


Always first to respond and teach me something new.  I didn't know 
about xsl:analyze-string.  (And of course you've left some typoes in 
to make sure I'm awake, analyze and regex I'm assuming.)

Ok.  This *basically* works, but with a line like:

<l>Why ha<supplied>l</supplied>dest &thorn;u were agaynes me</l>

it turns it into:

<l><w>Why</w> <w>ha</w><supplied>l</supplied><w>dest</w> <w>&thorn;u</w>
<w>were</w> <w>agaynes</w> <w>me</w></l>

or if I change it to l//text()

<l><w>Why</w> <w>ha</w><supplied><w>l</w></supplied><w>dest</w>
<w>&thorn;u</w> <w>were</w> <w>agaynes</w> <w>me</w></l>

When really:

<l><w>Why</w> <w>ha<supplied>l</supplied>dest</w> <w>&thorn;u</w>
<w>were</w> <w>agaynes</w> <w>me</w></l>

is what is wanted.  (The supplied letter being part of the word in this
case rather than a separate word.)
---
Dr James Cummings, Oxford Text Archive, University of Oxford
James dot Cummings at oucs dot ox dot ac dot uk 
CALL FOR PAPERS: Digital Medievalism (Kalamazoo) and 
Early Drama (Leeds) see http://users.ox.ac.uk/~jamesc/cfp.html

Current Thread