RE: [xsl] Another tokenize() question

Subject: RE: [xsl] Another tokenize() question
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Wed, 11 Aug 2004 12:12:15 +0100
> On Wed, 11 Aug 2004, David Carlisle wrote:
> 
> > regex="(\w|{{[^{{}}]*}})+"
> 
> Now I know I'm being a pain, but when I use that I get:
> 
> saxon -o temp2.xml temp.xml addwords2.xsl Error at 
> analyze-string on line 29 of
> file:addwords2.xsl: 
> net.sf.saxon.type.RegexTranslator$RegexSyntaxException: Error at
>   character 4 in regular expression: expected ())
>     Transformation failed: Run-time errors were reported
> 

Ignoring the AVT rules, the regex syntax allows { to be used without
escaping inside [], but not outside: outside [] it is reserved for use in
regex quantifiers such as x{3}. So it must be escaped as \{. So the regular
expression you want is

(\w|\{[^{}]*\})+

which is written in the regex attribute as

regex="(\w|\{{[^{{}}]*\}})+"

It might be less painful to do:

<xsl:variable name="regex">(\w|\{[^{}]*\})+</xsl:variable>
<xsl:analyze-string regex="{$regex}">

though sadly, I suspect that will prevent Saxon precompiling the regular
expression :-(

Better idea: use chevrons instead of curlies.

Michael Kay

Current Thread