Re: [xsl] Another tokenize() question

Subject: Re: [xsl] Another tokenize() question
From: David Carlisle <davidc@xxxxxxxxx>
Date: Wed, 11 Aug 2004 10:00:45 +0100
> analyze and regex
apart from my usual grotty level of typing, I have to cope with
American spelling as well...

> When really:
> 
> <l><w>Why</w> <w>ha<supplied>l</supplied>dest</w> <w>&thorn;u</w>
> <w>were</w> <w>agaynes</w> <w>me</w></l>

> is what is wanted. 


some people want everything:-)

handling mixed content _across_ element boundaries is a bit more
complicated. Probably I'd do something like this
First have a mode a that does

<xsl:template mode="a" match="*">
<xsl:text>{</xsl:text>
<xsl:value-of select="name()"/><!-- perhaps atributes too if you need them-->
<xsl:text> </xsl:text>
<xsl:value-of select="."/>
<xsl:text>}</xsl:text>
</xsl:template>
 
Then run that inside a variable as a first pass so you get

 <l>Why ha{suppliedl}dest
 &thorn;u were agaynes me</l>

Then do your main ana-whatist-thingy, making sure anything inside
braces, ie "{[^{}]*}" is part of the "word" regexp


Then you would have

<l><w>Why</w> <w>ha{supplied l}dest</w>
<w>&thorn;u</w> <w>were</w> <w>agaynes</w> <w>me</w></l>

except that instead of using value-of in the matching-substring part
where you are adding <w> do another nested analyaze-strang and match on  
"{([a-z]*) ([^{}])*}
and put the <supplied> element back by using
<xsl:element name="{regex-group(1)}">
  <xsl:value-of select="regex-group(2)"/>
</xsl:element>


David


________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

Current Thread