Subject: Re: [xsl] [XSLT2.0] xsl:analyze-string@regex syntax too limited From: Gunther Schadow <gunther@xxxxxxxxxxxxxxxxxxxxxx> Date: Thu, 16 Dec 2004 18:14:56 -0500 |
Thanks, good find. The only problem now is that this issue needs to be adressed in java.util.regex. Colin Paul Adams wrote: >>>>>>"Gunther" == Gunther Schadow <gunther@xxxxxxxxxxxxxxxxxxxxxx> writes: > > > Gunther> The boundary matcher matches a zero-width substring > Gunther> between a character matching the character class > Gunther> [A-Za-z_0-9] and a character matching the character class > Gunther> [^A-Za-z_0-9] or vice versa. </quote> > > Gunther> This is pretty clear. It may not make the > Gunther> internationalization people very happy because I can't do > Gunther> word-boundary matches on Hindi text. That's a true > Gunther> concern. > > So address it. Unicode report TR18 says (for Level 1 support): > > RL1.4 Simple Word Boundaries > To meet this requirement, an implementation shall extend the word boundary mechanism so that: > > 1. > > The class of <word_character> includes all the Alphabetic values from the Unicode character database, from UnicodeData.txt [UData]. See also Annex C: Compatibility Properties. > 2. > > Non-spacing marks are never divided from their base characters, and otherwise ignored in locating boundaries. > > Level 2 provides more general support for word boundaries between > arbitrary Unicode characters which may override this behavior. > > Level 1 support should certainly be met. -- Gunther Schadow, M.D., Ph.D. gschadow@xxxxxxxxxxxxxxx Associate Professor Indiana University School of Informatics Regenstrief Institute, Inc. Indiana University School of Medicine tel:1(317)630-7960 http://aurora.regenstrief.org
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] [XSLT2.0] xsl:analyze-str, Michael Kay | Thread | Re: [xsl] [XSLT2.0] xsl:analyze-str, Colin Paul Adams |
[xsl] XSL 1.1 Second Working Draft, Klaas_Bals | Date | RE: [xsl] no attributes outputed wh, Michael Kay |
Month |