Re: size?

Subject: Re: size?
From: James Clark <jjc@xxxxxxxxxx>
Date: Fri, 14 May 1999 13:26:39 +0700
Kay Michael wrote:
> 
> > -----Original Message-----
> > From: Steve Muench [mailto:SMUENCH@xxxxxxxxxxxxx]
> > It turns
> > out that the notion of the "length" of a string is
> > naturally and conveniently defined if you restrict
> > yourself to single-byte character sets, but for multibyte
> > character sets the notion of "length" is less well-defined.
> 
> The number of characters in a string is perfectly well-defined in XML.

The XML spec says "At user option, processors may normalize such
characters to some canonical form."  Normalization can change the number
of characters in a string (by composing or decomposing characters).

Another problem is with non-BMP characters (surrogate pairs).  In XML
these are treated as a single character, but the DOM counts them as two
characters.

> It
> might not be exactly the definition that an expert in Ethiopian or
> Glagolitic might like, but it would be good enough for the rest of us.

It's more a matter of putting in a definition that speakers of many
non-English languages would find counter to their established cultural
conventions. Imagine a spec that counted the letters "i" and "j" as two
characters and every other English character as one character.

James


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread