Subject: Re: Unicode and XSL (was substring()) From: James Clark <jjc@xxxxxxxxxx> Date: Sun, 06 Jun 1999 11:28:26 +0700 |
Paul Prescod wrote: > > David Carlisle wrote: > > > > Harder are characters out of the basic plane of unicode. These are a > > single character in XML eg accessed by a single Ӓ but since > > they don't fit into 16bits, they take up two slots when the unicode > > is encoded in utf-16. So the natural thing to do is to count these > > characters as single characters, but that means string indexing requires > > walking the string and thus proportional to the index rather than being > > a constant time array lookup. It also means that indexing and string > > length give different values if you use a `pure XML' approach or if > > you escape out to some language that treats strings as an array of 16bit > > quantities. > > Why are you worrying about the encoding? If your programming language is > broken in its handling of the platonic ideal concept of characters then > that is the XSL implementor's problem. There are ways of getting this > right: you can just use 32 bit characters or you can switch your character > width or iteration algoritm based on the actual contents of a string. This > isn't trivial but it is an implementor's problem and should not be > reflected in XSL. I basically agree with this. Counting characters by counting the the 16-bit quantities that encode characters in UTF-16 makes about as much sense as counting characters by counting the 8-bit quantities that encode characters in UTF-8 (which would mean for example that a dollar counts as one character, and a pound sterling sign counts as 2 charcaters). The counter-argument to this is that the DOM counts using UTF-16. I would respond by saying that the DOM is not counting characters but counting 16-bit quantities; there's nothing wrong with counting 16-bit quantities any more than there is with counting 8-bit quantities, it just isn't the same thing as counting characters. The XML Rec defines what a character is for XML and that is what we should count. James XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: Unicode and XSL (was substring(, Paul Prescod | Thread | Re: Unicode and XSL (was substring(, David Carlisle |
Re: Unicode and XSL (was substring(, Paul Prescod | Date | Re: Unicode and XSL (was substring(, James Clark |
Month |