Subject: Re: Special entity characters in Shift-JIS XSL. From: crism@xxxxxxxxxxxxx (Christopher R. Maden) Date: Wed, 15 Dec 1999 11:11:57 -0800 |
[Douglas Weed] >An application has been developed which uses the Microsoft MXSML parser >enclosed in a DLL to apply XSL files against an XML stream. The encoding is >in Shift-JIS as the application is double byte. The net result of the >application is HTML. The target browser has been developed to understand >certain 'special characters' or entities, which in themselves are double >byte. Much in the same way ' maps to an asterisk. For example >ù† would yield a special 2 byte character which is a Q surrounded >by a circle. This is so very wrong. The parser shouldn't even get past this. Bytes are not characters. Characters are not bytes. Numeric character references (like ù) refer to characters. The numbers are *always* Unicode; ù refers to the Unicode character whose number is decimal 249. Even if your current encoding is ASCII, which doesn't use any bytes over 127, ù has the same meaning. If your encoding is Latin-2, which assigns a different value to byte 249, ù still has the same meaning. It does *not* mean byte 0xf9 in the current encoding. This is true for HTML as well as for XML, though HTML browsers are very sloppy about interpreting this correctly. Use a reference to the Unicode code point that that two-byte sequence resolves to, not references to the bytes themselves. -Chris -- Christopher R. Maden, Solutions Architect Exemplary Technologies One Embarcadero Center, Ste. 2405 San Francisco, CA 94111 XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: Special entity characters in Sh, Tony Graham | Thread | RE: Special entity characters in Sh, Douglas Weed |
Why isn't node-set built in?, Paul W. Abrahams | Date | RE: XSL sheet to output to a plain , Mike Brown |
Month |