Re: [xsl] XSLT 2.0 : Unicode hex notation in regular expressions

Subject: Re: [xsl] XSLT 2.0 : Unicode hex notation in regular expressions
From: Pierrick Brihaye <pierrick.brihaye@xxxxxxxxxx>
Date: Thu, 12 Aug 2004 12:37:04 +0200
David Carlisle a icrit :

> [\\u0600-\\u06FF]
>
>
> \\ is a literal \ so I  that matches
>  any one of characters \ u 0 6 F and all characters in the range  0 to \,
>  except that 0 is char 48 and / is char 47 so this range is empty.

OK, got it. I now know why ":" matches [\\u0600-\\u06FF]. It is because the colon is char 58 (x3A), between zero which is char 48 (x30) and the backward slash which is char 92 (x5C).

> You don't need the u-notation to enter  code points into regexp (and
> they don't work)

Sorry to insist : why don't they work ? Aren't they supposed to do ?

If so, is it a Saxon-related problem or a more general one that would indicate that UTS #18 is still to be implemented, is irrelevant or whatever ?

How, for example, to use a useful syntax like matches(.,'\p{Script:Arabic}+') ?

> as you can just enter the characters directly

Mmmh... not always easy because of control characters. For arabic, see http://www.fileformat.info/info/unicode/char/0600/index.htm.

> or if
> you want an ascii representation use xml character references,
> & # x a b c ;

Indeed. <xsl:when test="matches(.,'[&#x0600;-&#x06FF;]+')">arabic</xsl:when> gives me the expected result. Thanks for the reminder !

Cheers,

p.b.

Current Thread