Subject: RE: [xsl] XSLT 2.0 : Unicode hex notation in regular expressions From: "Michael Kay" <mhk@xxxxxxxxx> Date: Thu, 12 Aug 2004 12:12:08 +0100 |
The notation \u1234 is not supported in XPath 2.0 regular expressions. Use ሴ instead. Michael Kay > -----Original Message----- > From: Pierrick Brihaye [mailto:pierrick.brihaye@xxxxxxxxxx] > Sent: 12 August 2004 10:38 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: [xsl] XSLT 2.0 : Unicode hex notation in regular expressions > > Hi, > > I don't know if my XSLT syntax is wrong or if it is a Saxon-related > problem. Let's blame the XSLT writer rather than the XSLT processor > first ;-) > > Given the following XML : > > <?xml version="1.0" encoding="UTF-8"?> > <text>livre : ????</text> > > And the following XSLT : > > <?xml version="1.0" encoding="UTF-8"?> > <xsl:stylesheet version="2.0" > xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> > <xsl:template match="/text"> > <xsl:comment><xsl:value-of > select="system-property('xsl:vendor')" > /></xsl:comment> > <words> > <xsl:for-each select="tokenize(.,'\s+')"> > <word> > <xsl:attribute name="language"> > <xsl:choose> > <xsl:when test="matches(.,'[a-z]+')">latin</xsl:when> > <xsl:when > test="matches(.,'[\\u0600-\\u06FF]+')">arabic</xsl:when> > <xsl:otherwise>whatever</xsl:otherwise> > </xsl:choose> > </xsl:attribute> > <xsl:attribute name="codepoints"><xsl:value-of > select="string-to-codepoints(.)"/></xsl:attribute> > <xsl:value-of select="."/> > </word> > </xsl:for-each> > </words> > </xsl:template> > </xsl:stylesheet> > > I get : > > <?xml version="1.0" encoding="UTF-8"?> > <!--SAXON 8.0 from Saxonica--> > <words> > <word language="latin" codepoints="108 105 118 114 > 101">livre</word> > <word language="arabic" codepoints="58">:</word> > <word language="whatever" codepoints="1603 1578 1575 > 1576">????</word> > </words> > > Why this curious match for codepoint 58 ? And why no match for the > arabic characters ? > > BTW, I first tried : matches(.,'[\u0600-\u06FF]+') as stated by > http://www.unicode.org/reports/tr18/#Hex_notation > > But Saxon returned the following error : > > Error at xsl:when on line 11 of file:/C:/...: > net.sf.saxon.type.RegexTranslator$RegexSyntaxException: Error at > character 2 in regular expression: bad escape sequence > > That's why I doubled the "\" character. Is this doubling > spec-compliant ? > > Cheers, > > p.b.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] XSLT 2.0 : Unicode hex no, Michael Kay | Thread | [xsl] two level grouping, Martina Kinzl |
Re: [xsl] recursivity and param, David Carlisle | Date | Re: [xsl] extract xpath locator, Nicolas Mazziotta |
Month |