Subject: Re: Special entity characters in Shift-JIS XSL. From: Tony Graham <tgraham@xxxxxxxxxxxxxxxx> Date: Wed, 15 Dec 1999 13:05:11 -0400 (EST) |
At 15 Dec 1999 08:55 -0500, Douglas Weed wrote: > An application has been developed which uses the Microsoft MXSML parser > enclosed in a DLL to apply XSL files against an XML stream. The encoding is > in Shift-JIS as the application is double byte. The net result of the > application is HTML. The target browser has been developed to understand > certain 'special characters' or entities, which in themselves are double > byte. Much in the same way ' maps to an asterisk. For example > ù† would yield a special 2 byte character which is a Q surrounded > by a circle. If this character sequence is placed directly into a .htm > page, it works. However, as I suspected, when placed within an xsl file and > transformed with the xml, it yields nothing since the parser tries format > it. I attempted to use an in-line DTD to define the entity and use the > definition within the XML file, however, MSXML has some real difficulties > handling an in-line DTD when the XML is a character string and not a file. > The work-arounds specified by MS are not feasible. The question : does > another technique exist to have the XSL file ignore ù† and pass it > straight through to the HTML stream? Sorry for the length of the message > and thanks for any responses. In XML, numeric character references are always to Unicode code values. A conforming application should recognise ù&134; as LATIN SMALL LETTER O WITH STROKE followed by one of the C1 control characters. What comes out of your MSXML DLL almost certainly uses two bytes to represent each character -- UTF-16 uses two bytes per character, and UTF-8 also uses two bytes per character for character numbers in that range. Relying on two numeric character references to represent a double-byte sequence is fragile, as you have found. The numeric character reference for the Unicode character CIRCLED LATIN CAPITAL LETTER Q is Ⓠ. I don't know that MSXML allows you to specify the output encoding. However, if I'm correct in thinking that a circled Q is gaiji in Shift-JIS, the character might be dropped in a conversion to Shift-JIS anyway. Regards, Tony Graham ====================================================================== Tony Graham mailto:tgraham@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9632 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ====================================================================== XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Special entity characters in Shift-, Douglas Weed | Thread | Re: Special entity characters in Sh, Christopher R. Maden |
Re: XT, Cocoon, & character entitie, Phil Lanch | Date | Re: XSL sheet to output to a plain , Paul Levin |
Month |