Re: Special entity characters in Shift-JIS XSL.

Subject: Re: Special entity characters in Shift-JIS XSL.
From: crism@xxxxxxxxxxxxx (Christopher R. Maden)
Date: Wed, 15 Dec 1999 11:11:57 -0800
[Douglas Weed]
>An application has been developed which uses the Microsoft MXSML parser
>enclosed in a DLL to apply XSL files against an XML stream.  The encoding is
>in Shift-JIS as the application is double byte. The net result of the
>application is HTML.  The target browser has been developed to understand
>certain 'special characters' or entities, which in themselves are double
>byte.  Much in the same way ' maps to an asterisk.  For example
>ù† would yield a special 2 byte character which is a Q surrounded
>by a circle.

This is so very wrong.  The parser shouldn't even get past this.

Bytes are not characters.  Characters are not bytes.

Numeric character references (like ù) refer to characters.  The
numbers are *always* Unicode; ù refers to the Unicode character whose
number is decimal 249.  Even if your current encoding is ASCII, which
doesn't use any bytes over 127, ù has the same meaning.  If your
encoding is Latin-2, which assigns a different value to byte 249, ù
still has the same meaning.  It does *not* mean byte 0xf9 in the current
encoding.  This is true for HTML as well as for XML, though HTML browsers
are very sloppy about interpreting this correctly.

Use a reference to the Unicode code point that that two-byte sequence
resolves to, not references to the bytes themselves.

-Chris

--
Christopher R. Maden, Solutions Architect
Exemplary Technologies
One Embarcadero Center, Ste. 2405
San Francisco, CA 94111



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread