RE: [xsl] exslt tokenize mixed content

Subject: RE: [xsl] exslt tokenize mixed content
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Wed, 1 Nov 2006 16:31:25 -0000
Try

<xsl:variable name="spaced-out-nodes">
  <xsl:for-each select=".//text()">
    <xsl:text> </xsl:text>
    <xsl:value-of select="."/>
    <xsl:text> </xsl:text>
  </xsl:for-each>
</xsl:variable>
<xsl:variable name="words" select="str:tokenize($spaced-out-nodes)"/>

Michael Kay
http://www.saxonica.com/ 

> -----Original Message-----
> From: Richard Lewis [mailto:richardlewis@xxxxxxxxxxxxxx] 
> Sent: 01 November 2006 16:06
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] exslt tokenize mixed content
> 
> Hi there,
> 
> I'm using libxslt's EXSLT implementation's str:tokenize() 
> function in an attempt to tokenize all the words inside a 
> particular element. What I need to be able to do is to have 
> it generate a list of tokens for /all/ the character content 
> from inside a mixed content element:
> 
> <xsl:template match="section">
>   <xsl:variable name="words" select="str:tokenize(string(.))" 
> /> </xsl:template>
> 
> This almost works except that using string() on a node 
> doesn't give you any white space between the last character 
> in one subelement and the next character following that 
> subelement. e.g.:
> 
> <section>
>   <title>Section the First</title>
>   <p>The content of this section</p>
> </section>
> 
> calling string on this gives you:
> "Section the FirstThe content of this section"
> 
> where I need a space between "First" and "The" so that 
> str:tokenize() will interpret them as separate tokens.
> 
> Any ideas?
> 
> Cheers,
> Richard
> --
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Richard Lewis
> Sonic Arts Research Archive
> http://www.sara.uea.ac.uk/
> JID: ironchicken@xxxxxxxxxxxxxxx
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Current Thread