Subject: RE: Special entity characters in Shift-JIS XSL. From: Douglas Weed <Dweed@xxxxxxxxxxxx> Date: Wed, 15 Dec 1999 17:13:51 -0500 |
Thanks for the replies. The issue I am having is not related to the intepretation of the entities as unicode vs bytes. The encoding is double byte and the characters I am trying to get into the HTML are placed in the XSL, along with the other HTML tags. What I really need is a way to 'escape' the &#XXX;&#XXX; decimal references so that the parser will not try to interpert them as either ASCII or Unicode. Instead it will place them in the output stream, as is. The target browser, obviously not one of the big 2, has a special code page which will translate the two entities into 1 double byte image. The code page is similar to WingDings found in the true type fonts on Windows. Thanks again for the replies. -----Original Message----- From: crism@xxxxxxxxxxxxx [mailto:crism@xxxxxxxxxxxxx] Sent: Wednesday, December 15, 1999 2:12 PM To: xsl-list@xxxxxxxxxxxxxxxx Subject: Re: Special entity characters in Shift-JIS XSL. [Douglas Weed] >An application has been developed which uses the Microsoft MXSML parser >enclosed in a DLL to apply XSL files against an XML stream. The encoding is >in Shift-JIS as the application is double byte. The net result of the >application is HTML. The target browser has been developed to understand >certain 'special characters' or entities, which in themselves are double >byte. Much in the same way ' maps to an asterisk. For example >ù† would yield a special 2 byte character which is a Q surrounded >by a circle. This is so very wrong. The parser shouldn't even get past this. Bytes are not characters. Characters are not bytes. Numeric character references (like ù) refer to characters. The numbers are *always* Unicode; ù refers to the Unicode character whose number is decimal 249. Even if your current encoding is ASCII, which doesn't use any bytes over 127, ù has the same meaning. If your encoding is Latin-2, which assigns a different value to byte 249, ù still has the same meaning. It does *not* mean byte 0xf9 in the current encoding. This is true for HTML as well as for XML, though HTML browsers are very sloppy about interpreting this correctly. Use a reference to the Unicode code point that that two-byte sequence resolves to, not references to the bytes themselves. -Chris -- Christopher R. Maden, Solutions Architect Exemplary Technologies One Embarcadero Center, Ste. 2405 San Francisco, CA 94111 XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: Special entity characters in Sh, Christopher R. Maden | Thread | RE: Special entity characters in Sh, Tony Graham |
<xsl:copy-of> question, Hunter, David | Date | Moving from LotusXSL to XT, Philip Puccio |
Month |