RE: Special entity characters in Shift-JIS XSL.

Subject: RE: Special entity characters in Shift-JIS XSL.
From: Douglas Weed <Dweed@xxxxxxxxxxxx>
Date: Wed, 15 Dec 1999 17:13:51 -0500
Thanks for the replies.  The issue I am having is not related to the
intepretation of the entities as unicode vs bytes.  The encoding is double
byte and the characters I am trying to get into the HTML are placed in the
XSL, along with the other HTML tags.  What I really need is a way to
'escape' the &#XXX;&#XXX; decimal references so that the parser will not try
to interpert them as either ASCII or Unicode.  Instead it will place them in
the output stream, as is.  The target browser, obviously not one of the big
2, has a special code page which will translate the two entities into 1
double byte image.  The code page is similar to WingDings found in the true
type fonts on Windows. Thanks again for the replies.

-----Original Message-----
From: crism@xxxxxxxxxxxxx [mailto:crism@xxxxxxxxxxxxx]
Sent: Wednesday, December 15, 1999 2:12 PM
To: xsl-list@xxxxxxxxxxxxxxxx
Subject: Re: Special entity characters in Shift-JIS XSL.


[Douglas Weed]
>An application has been developed which uses the Microsoft MXSML parser
>enclosed in a DLL to apply XSL files against an XML stream.  The encoding
is
>in Shift-JIS as the application is double byte. The net result of the
>application is HTML.  The target browser has been developed to understand
>certain 'special characters' or entities, which in themselves are double
>byte.  Much in the same way &#39; maps to an asterisk.  For example
>&#249;&#134; would yield a special 2 byte character which is a Q surrounded
>by a circle.

This is so very wrong.  The parser shouldn't even get past this.

Bytes are not characters.  Characters are not bytes.

Numeric character references (like &#249;) refer to characters.  The
numbers are *always* Unicode; &#249; refers to the Unicode character whose
number is decimal 249.  Even if your current encoding is ASCII, which
doesn't use any bytes over 127, &#249; has the same meaning.  If your
encoding is Latin-2, which assigns a different value to byte 249, &#249;
still has the same meaning.  It does *not* mean byte 0xf9 in the current
encoding.  This is true for HTML as well as for XML, though HTML browsers
are very sloppy about interpreting this correctly.

Use a reference to the Unicode code point that that two-byte sequence
resolves to, not references to the bytes themselves.

-Chris

--
Christopher R. Maden, Solutions Architect
Exemplary Technologies
One Embarcadero Center, Ste. 2405
San Francisco, CA 94111



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread