|
Subject: Re: [xsl] XSLT 2.0 : Unicode hex notation in regular expressions From: David Carlisle <davidc@xxxxxxxxx> Date: Thu, 12 Aug 2004 12:14:56 +0100 |
> Sorry to insist : why don't they work ?
Because that's life:-)
> Aren't they supposed to do ?
No the syntax in xslt is (except where otherwise noted) that of w3c xml
schema, and that doesn't have any notation like that.
> If so, is it a Saxon-related problem or a more general one that would
> indicate that UTS #18 is still to be implemented, is irrelevant or
> whatever ?
The _semantics_ of unicode regexp comes from there eg the predefined
character classes (you may prefer to use a character class refering to
the arabic block for example rather than use explict code points) but (I
would guess) the U notation wasn't supported as that is the unicode
standard way of accessing characters by code point reference in plain
ascii text and that is never used in an XML context. U+06FF is legal XML
character data but it is those 6 characters, if you want to refer to
character hex 06ff you always use & # x 0 6 F F ; in XML.
How, for example, to use a useful syntax like
matches(.,'\p{Script:Arabic}+') ?
schema-2 says: http://www.w3.org/TR/xmlschema-2/#regexs
[Definition:] [Unicode Database] groups code points into a number of
blocks such as Basic Latin (i.e., ASCII), Latin-1 Supplement, Hangul
Jamo, CJK Compatibility, etc. The set containing all characters that
have block name X (with all white space stripped out), can be identified
with a block escape \p{IsX}. The complement of this set is specified
with the block escape \P{IsX}. ([\P{IsX}] = [^\p{IsX}]).
...
For example,
the 7block escape7 for identifying the ASCII characters is \p{IsBasicLatin}.
so that would be \p(IsArabic)
David
________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] XSLT 2.0 : Unicode hex no, Pierrick Brihaye | Thread | Re: [xsl] XSLT 2.0 : Unicode hex no, Pierrick Brihaye |
| Re: [xsl] extract xpath locator, Nicolas Mazziotta | Date | Re: [xsl] XSLT 2.0 : Unicode hex no, David Carlisle |
| Month |