Subject: Re: [xsl] CJK UTF-16 test From: Mike Brown <mike@xxxxxxxx> Date: Wed, 28 Mar 2001 21:35:34 -0700 (MST) |
Benjamin Franz wrote: > XML does NOT support UTF-16 since UTF-16 includes the surrogates Wow, strike that from the archives, because it's dead wrong. XML is specified in terms of sequences of allowable ISO/IEC 10646-1 characters, not particular binary-encoded representations of those characters. > [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | > [#x10000-#x10FFFF] These are characters, not UTF-16 bytes. In ISO/IEC 10646-1 and Unicode _there is no character_ at code point 0xD800. And in a UTF-16 encoded document, the bit sequence that I would write in hex as D800 (big endian) or 00D8 (little) are not a character. The *sequence* D800 DC00 (big) represents character #x10000, which I write here using the same notation as the EBNF excerpt you quoted from the XML spec. If you were to say that an XML document can contain a "character" #xD800 then you would a.) be in violation of the definition of character as being what from ISO/IEC 10646-1 (which XML relies on), and b.) have no way of representing that character in a UTF-16 encoded document, because by definition, D800 in UTF-16 is the first half of a surrogate pair, not a character... - Mike _____________________________________________________________________________ mike j. brown, software engineer at | xml/xslt: http://skew.org/xml/ webb.net in denver, colorado, USA | personal: http://hyperreal.org/~mike/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] CJK UTF-16 test, David Carlisle | Thread | RE: [xsl] CJK UTF-16 test, Michael Kay |
RE: [xsl] XALAN - obtaining the -XS, Robert Nicholson | Date | RE: [xsl] XALAN - obtaining the -XS, Tim Watts |
Month |