RE: MSXML and Encoding

Subject: RE: MSXML and Encoding
From: Ian Brockbank <ian@xxxxxxxxxxxxxx>
Date: Wed, 8 Sep 1999 17:20:37 +0100
Hi Steven,

> Very strange.
> Characters such as e or e do not get parsed.
> 
> Eg. This fails, giving a ?
> 
> <?xml version='1.0' encoding='UTF-8'?>
> <root>
> e
> </root>
> 
> giving the reason
> An Invalid character was found in text content. Line 3, Position 1 
> ?</root>

Let's look at this in hex:

3c 3f 78 6d 6c 20 76 65  72 73 69 6f 6e 3d 27 31  <?xml version='1
2e 30 27 20 65 6e 63 6f  64 69 6e 67 3d 27 55 54  .0' encoding='UT
46 2d 38 27 3f 3e 0d 0a  3c 72 6f 6f 74 3e 0d 0a  F-8'?>..<root>..
e8 0d 0a 3c 2f 72 6f 6f  74 3e 0d 0a		  ...</root>..

Remember the table:

> UTF-8 mapping			UCS-2 char
> -------------			----------
> 0nnnnnnn				0x0000-0x007f
> 110nnnnn 10nnnnnn		 	0x0080-0x03ff
> 1110nnnn 10nnnnnn 10nnnnnn 	0x0400-0xffff

All is fine for the first 3 lines of hex - everything is less than
0x80, so it corresponds to itself.

Then we hit the e.  This is 0xe8 or 11101000.  This is interpreted
as the start of a 3-byte character of the form 1000nnnn nnnnnnnn,
where the next two characters are of the form 10nnnnnnnn.  What's
the next character in the file? 0d (carriage return), or 00001101.
That doesn't start with 10, so something's gone wrong with the UTF-8
encoding.  So the processor gives an error.

If you want e in your document, you have to encode it into UTF-8 as
2-byte with nnnnn nnnnnn corresponding to 000 11101000 (e8), ie
11000011 10101000 - c3 a8 or A?  Alternatively you could use the
entity &egrave; (assuming this is defined in your DTD - see previous
discussions).

Any clearer?

Cheers,

Ian
--
Ian Brockbank, Indigo Active Vision Systems, The Edinburgh Technopole,
Bush Loan, Edinburgh EH26 0PJ   Tel: 0131-475-7234  Fax: 0131-475-7201
work: ian@xxxxxxxxxxxxxx           personal: Ian.Brockbank@xxxxxxxxxxx
web: ScottishDance@xxxxxxxxxxx           http://www.scottishdance.net/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread