[xsl] exslt tokenize mixed content

Subject: [xsl] exslt tokenize mixed content
From: Richard Lewis <richardlewis@xxxxxxxxxxxxxx>
Date: Wed, 1 Nov 2006 16:05:36 +0000
Hi there,

I'm using libxslt's EXSLT implementation's str:tokenize() function in an 
attempt to tokenize all the words inside a particular element. What I need to 
be able to do is to have it generate a list of tokens for /all/ the character 
content from inside a mixed content element:

<xsl:template match="section">
  <xsl:variable name="words" select="str:tokenize(string(.))" />

This almost works except that using string() on a node doesn't give you any 
white space between the last character in one subelement and the next 
character following that subelement. e.g.:

  <title>Section the First</title>
  <p>The content of this section</p>

calling string on this gives you:
"Section the FirstThe content of this section"

where I need a space between "First" and "The" so that str:tokenize() will 
interpret them as separate tokens.

Any ideas?

Richard Lewis
Sonic Arts Research Archive
JID: ironchicken@xxxxxxxxxxxxxxx

Current Thread