Re: [xsl] [XSLT2.0] xsl:analyze-string@regex syntax too limited

Subject: Re: [xsl] [XSLT2.0] xsl:analyze-string@regex syntax too limited
From: Colin Paul Adams <colin@xxxxxxxxxxxxxxxxxx>
Date: 16 Dec 2004 07:25:14 +0000
>>>>> "Gunther" == Gunther Schadow <gunther@xxxxxxxxxxxxxxxxxxxxxx> writes:

    Gunther> The boundary matcher matches a zero-width substring
    Gunther> between a character matching the character class
    Gunther> [A-Za-z_0-9] and a character matching the character class
    Gunther> [^A-Za-z_0-9] or vice versa.  </quote>

    Gunther> This is pretty clear. It may not make the
    Gunther> internationalization people very happy because I can't do
    Gunther> word-boundary matches on Hindi text. That's a true
    Gunther> concern.

So address it. Unicode report TR18 says (for Level 1 support):

RL1.4  	Simple Word Boundaries
	To meet this requirement, an implementation shall extend the word boundary mechanism so that:

   1.

      The class of <word_character> includes all the Alphabetic values from the Unicode character database, from UnicodeData.txt [UData]. See also Annex C: Compatibility Properties.
   2.

      Non-spacing marks are never divided from their base characters, and otherwise ignored in locating boundaries. 

Level 2 provides more general support for word boundaries between
arbitrary Unicode characters which may override this behavior.

Level 1 support should certainly be met.
-- 
Colin Paul Adams
Preston Lancashire

Current Thread