[xsl] XSLT 2.0 : Unicode hex notation in regular expressions

Hi,

I don't know if my XSLT syntax is wrong or if it is a Saxon-related problem. Let's blame the XSLT writer rather than the XSLT processor first ;-)

Given the following XML :

<?xml version="1.0" encoding="UTF-8"?>
<text>livre : YX*X'X(</text>

And the following XSLT :

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";> <xsl:template match="/text"> <xsl:comment><xsl:value-of select="system-property('xsl:vendor')" /></xsl:comment> <words> <xsl:for-each select="tokenize(.,'\s+')"> <word> <xsl:attribute name="language"> <xsl:choose> <xsl:when test="matches(.,'[a-z]+')">latin</xsl:when> <xsl:when test="matches(.,'[\\u0600-\\u06FF]+')">arabic</xsl:when> <xsl:otherwise>whatever</xsl:otherwise> </xsl:choose> </xsl:attribute> <xsl:attribute name="codepoints"><xsl:value-of select="string-to-codepoints(.)"/></xsl:attribute> <xsl:value-of select="."/> </word> </xsl:for-each> </words> </xsl:template> </xsl:stylesheet>

I get :

<?xml version="1.0" encoding="UTF-8"?>
<!--SAXON 8.0 from Saxonica-->
<words>
  <word language="latin" codepoints="108 105 118 114 101">livre</word>
  <word language="arabic" codepoints="58">:</word>
  <word language="whatever" codepoints="1603 1578 1575 1576">YX*X'X(</word>
</words>

Why this curious match for codepoint 58 ? And why no match for the arabic characters ?

BTW, I first tried : matches(.,'[\u0600-\u06FF]+') as stated by http://www.unicode.org/reports/tr18/#Hex_notation

But Saxon returned the following error :

Error at xsl:when on line 11 of file:/C:/...: net.sf.saxon.type.RegexTranslator$RegexSyntaxException: Error at character 2 in regular expression: bad escape sequence

That's why I doubled the "\" character. Is this doubling spec-compliant ?

Cheers,

p.b.

<- Previous	Index	Next ->
Re: [xsl] Need help rendering the H, Wendell Piez	Thread	Re: [xsl] XSLT 2.0 : Unicode hex no, David Carlisle
Re: [xsl] regexs, grouping (?) and , Jeni Tennison	Date	Re: [xsl] XSLT 2.0 : Unicode hex no, David Carlisle
	Month

<-prev [Thread] next->	<-prev [Date] next->
Month Index \| List Home