Subject: RE: MSXML and Encoding From: Ian Brockbank <ian@xxxxxxxxxxxxxx> Date: Wed, 8 Sep 1999 17:20:37 +0100 |
Hi Steven, > Very strange. > Characters such as e or e do not get parsed. > > Eg. This fails, giving a ? > > <?xml version='1.0' encoding='UTF-8'?> > <root> > e > </root> > > giving the reason > An Invalid character was found in text content. Line 3, Position 1 > ?</root> Let's look at this in hex: 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 27 31 <?xml version='1 2e 30 27 20 65 6e 63 6f 64 69 6e 67 3d 27 55 54 .0' encoding='UT 46 2d 38 27 3f 3e 0d 0a 3c 72 6f 6f 74 3e 0d 0a F-8'?>..<root>.. e8 0d 0a 3c 2f 72 6f 6f 74 3e 0d 0a ...</root>.. Remember the table: > UTF-8 mapping UCS-2 char > ------------- ---------- > 0nnnnnnn 0x0000-0x007f > 110nnnnn 10nnnnnn 0x0080-0x03ff > 1110nnnn 10nnnnnn 10nnnnnn 0x0400-0xffff All is fine for the first 3 lines of hex - everything is less than 0x80, so it corresponds to itself. Then we hit the e. This is 0xe8 or 11101000. This is interpreted as the start of a 3-byte character of the form 1000nnnn nnnnnnnn, where the next two characters are of the form 10nnnnnnnn. What's the next character in the file? 0d (carriage return), or 00001101. That doesn't start with 10, so something's gone wrong with the UTF-8 encoding. So the processor gives an error. If you want e in your document, you have to encode it into UTF-8 as 2-byte with nnnnn nnnnnn corresponding to 000 11101000 (e8), ie 11000011 10101000 - c3 a8 or A? Alternatively you could use the entity è (assuming this is defined in your DTD - see previous discussions). Any clearer? Cheers, Ian -- Ian Brockbank, Indigo Active Vision Systems, The Edinburgh Technopole, Bush Loan, Edinburgh EH26 0PJ Tel: 0131-475-7234 Fax: 0131-475-7201 work: ian@xxxxxxxxxxxxxx personal: Ian.Brockbank@xxxxxxxxxxx web: ScottishDance@xxxxxxxxxxx http://www.scottishdance.net/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: MSXML and Encoding, Bernhard Keil | Thread | XSL and FO, Bovone Stefano |
RE: MSXML and Encoding, Bernhard Keil | Date | Re: Function from-ancestors and par, G. Ken Holman |
Month |