Subject: RE: MSXML and Encoding From: Ian Brockbank <ian@xxxxxxxxxxxxxx> Date: Wed, 8 Sep 1999 16:03:24 +0100 |
Hi Steven, > Special European Characters don't seem to work for UTF-8 (at > least for the MSXML parser). I had a look at the W3C doc and tried > the UTF-16 as they said it should be supported, but the at the start > of parsing it said the encoding is not supported. Indeed Special European Characters are not part of utf-8. It matches ASCII only as the character itself. > I have read a bit now on the UTF-8 and UTF-16 explanations as > my knowledge of them isn't great. Does anybody have a few sentences > to explain these ? - I am going to look at some stuff at unicode.org > as well. UTF-8 characters are between 1 and 3 bytes long, mapping approximately as follows (it's a while since I did this, and this is from memory, so apologies if I've not got it exactly right, but it's similar ). UCS-2 char UTF-8 mapping ------------- ------------- 0x0000-0x007f 0x0nnnnnnn 0x0080-0x03ff 0x110nnnnn 0x10nnnnnn 0x0400-0xffff 0x1110nnnn 0x10nnnnnn 0x10nnnnnn where nnnnn... are the bits which build up the UCS-2 value. Note: You can tell what type of byte you have from the first 1-4 bits 0 - single-byte 10 - continuation 110 - 2-byte 1110 - 3-byte This means that (eg) e (0xe9 => 0x11101001) is interpreted as the start of a 3-byte character in the range 0x9000-0x9fff. HTH, Ian -- Ian Brockbank, Indigo Active Vision Systems, The Edinburgh Technopole, Bush Loan, Edinburgh EH26 0PJ Tel: 0131-475-7234 Fax: 0131-475-7201 work: ian@xxxxxxxxxxxxxx personal: Ian.Brockbank@xxxxxxxxxxx web: ScottishDance@xxxxxxxxxxx http://www.scottishdance.net/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: MSXML and Encoding, Steve Schafer | Thread | RE: MSXML and Encoding, Ian Brockbank |
RE: MSXML and Encoding, Steven Livingstone, | Date | RE: MSXML and Encoding, Ian Brockbank |
Month |