Subject: Re: [xsl] XSLT 2.0 : Unicode hex notation in regular expressions From: David Carlisle <davidc@xxxxxxxxx> Date: Thu, 12 Aug 2004 12:14:56 +0100 |
> Sorry to insist : why don't they work ? Because that's life:-) > Aren't they supposed to do ? No the syntax in xslt is (except where otherwise noted) that of w3c xml schema, and that doesn't have any notation like that. > If so, is it a Saxon-related problem or a more general one that would > indicate that UTS #18 is still to be implemented, is irrelevant or > whatever ? The _semantics_ of unicode regexp comes from there eg the predefined character classes (you may prefer to use a character class refering to the arabic block for example rather than use explict code points) but (I would guess) the U notation wasn't supported as that is the unicode standard way of accessing characters by code point reference in plain ascii text and that is never used in an XML context. U+06FF is legal XML character data but it is those 6 characters, if you want to refer to character hex 06ff you always use & # x 0 6 F F ; in XML. How, for example, to use a useful syntax like matches(.,'\p{Script:Arabic}+') ? schema-2 says: http://www.w3.org/TR/xmlschema-2/#regexs [Definition:] [Unicode Database] groups code points into a number of blocks such as Basic Latin (i.e., ASCII), Latin-1 Supplement, Hangul Jamo, CJK Compatibility, etc. The set containing all characters that have block name X (with all white space stripped out), can be identified with a block escape \p{IsX}. The complement of this set is specified with the block escape \P{IsX}. ([\P{IsX}] = [^\p{IsX}]). ... For example, the 7block escape7 for identifying the ASCII characters is \p{IsBasicLatin}. so that would be \p(IsArabic) David ________________________________________________________________________ This e-mail has been scanned for all viruses by Star Internet. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk ________________________________________________________________________
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] XSLT 2.0 : Unicode hex no, Pierrick Brihaye | Thread | Re: [xsl] XSLT 2.0 : Unicode hex no, Pierrick Brihaye |
Re: [xsl] extract xpath locator, Nicolas Mazziotta | Date | Re: [xsl] XSLT 2.0 : Unicode hex no, David Carlisle |
Month |