Subject: Re: [xsl] [XSLT2.0] xsl:analyze-string@regex syntax too limited From: Colin Paul Adams <colin@xxxxxxxxxxxxxxxxxx> Date: 16 Dec 2004 07:25:14 +0000 |
>>>>> "Gunther" == Gunther Schadow <gunther@xxxxxxxxxxxxxxxxxxxxxx> writes: Gunther> The boundary matcher matches a zero-width substring Gunther> between a character matching the character class Gunther> [A-Za-z_0-9] and a character matching the character class Gunther> [^A-Za-z_0-9] or vice versa. </quote> Gunther> This is pretty clear. It may not make the Gunther> internationalization people very happy because I can't do Gunther> word-boundary matches on Hindi text. That's a true Gunther> concern. So address it. Unicode report TR18 says (for Level 1 support): RL1.4 Simple Word Boundaries To meet this requirement, an implementation shall extend the word boundary mechanism so that: 1. The class of <word_character> includes all the Alphabetic values from the Unicode character database, from UnicodeData.txt [UData]. See also Annex C: Compatibility Properties. 2. Non-spacing marks are never divided from their base characters, and otherwise ignored in locating boundaries. Level 2 provides more general support for word boundaries between arbitrary Unicode characters which may override this behavior. Level 1 support should certainly be met. -- Colin Paul Adams Preston Lancashire
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] [XSLT2.0] xsl:analyze-str, Gunther Schadow | Thread | RE: [xsl] [XSLT2.0] xsl:analyze-str, Michael Kay |
RE: [xsl] no attributes outputed wh, Jarno.Elovirta | Date | Re: [xsl] Tree from directory listi, Geert Josten |
Month |