RE: [xsl] XSLT 2.0 : Unicode hex notation in regular expressions

Subject: RE: [xsl] XSLT 2.0 : Unicode hex notation in regular expressions
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Thu, 12 Aug 2004 12:19:42 +0100
>  > You don't need the u-notation to enter  code points into 
> regexp (and
>  > they don't work)
> 
> Sorry to insist : why don't they work ? Aren't they supposed to do ?
> 
> If so, is it a Saxon-related problem or a more general one that would 
> indicate that UTS #18 is still to be implemented, is irrelevant or 
> whatever ?
> 

The syntax of regular expressions in XSLT and XPath is defined in the XPath
2.0 specifications. The spec isn't easy to read because it's defined by
reference to regular expressions in XML Schema with a few extensions. Also,
XML Schema 1.0 had lots of bugs, but fortunately there's a second edition
that means you no longer have to delve into all the errata. The spec doesn't
include UTS #18 escape syntax because it's not needed; XML already has a
perfectly good way of expressing Unicode characters in ASCII, and you don't
need two different ways of doing it.

The XML Schema syntax, and therefore the XPath syntax, does allow reference
to Unicode character blocks: for precise details see the spec, or Chapter 11
of my XPath 2.0 Programmer's Reference which should hit the streets any day
now.

Michael Kay

Current Thread