Subject: Re: size? From: James Clark <jjc@xxxxxxxxxx> Date: Fri, 14 May 1999 13:26:39 +0700 |
Kay Michael wrote: > > > -----Original Message----- > > From: Steve Muench [mailto:SMUENCH@xxxxxxxxxxxxx] > > It turns > > out that the notion of the "length" of a string is > > naturally and conveniently defined if you restrict > > yourself to single-byte character sets, but for multibyte > > character sets the notion of "length" is less well-defined. > > The number of characters in a string is perfectly well-defined in XML. The XML spec says "At user option, processors may normalize such characters to some canonical form." Normalization can change the number of characters in a string (by composing or decomposing characters). Another problem is with non-BMP characters (surrogate pairs). In XML these are treated as a single character, but the DOM counts them as two characters. > It > might not be exactly the definition that an expert in Ethiopian or > Glagolitic might like, but it would be good enough for the rest of us. It's more a matter of putting in a definition that speakers of many non-English languages would find counter to their established cultural conventions. Imagine a spec that counted the letters "i" and "j" as two characters and every other English character as one character. James XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: size?, Paul Prescod | Thread | Architectural forms processing via , Vun Kannon, David |
RE: XLink: behavior must go!, Jonathan Borden | Date | XSL tools in 'C', Harihara Vinayakaram |
Month |