[xsl] UTF-8 Output

Subject: [xsl] UTF-8 Output
From: Jim Schmidt <JSchmidt@xxxxxxxx>
Date: Thu, 10 May 2001 14:09:14 -0700
I have a style sheet that is used for outputing an XML document to HTML.
Some of the XML elements contain HTML so I output them with the following
style sheet which I include from the document style sheet.

<?xml version="1.0" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
version="1.0" >

<xsl:template match=" a | applet | b | big | body | br | caption | 
    cite | code | col | colgroup | dd | div | dl | dt | em | font | 
    form | frame | frameset | head | h1 | h2 | h3 | h4 | h5 | h6 | hr | 
    html | i | iframe | link | li | map | meta | noframes | ol | 
    p | param | pre | s | script | small | span | strong | style | 
    sub | sup | td | th | tr | tt | ul | var | table ">
    <xsl:copy>
    <xsl:copy-of select="@*" />
    <xsl:apply-templates />
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

The document style sheet includes the output tag shown below:

<xsl:output method="html" indent="yes" xalan:indent-amount="4"
encoding="UTF-8" />

The XML document also defines its encoding as UTF-8.

Everything works very well except Unicode characters. I am using Xalan 2.0.
When I look at the XSL trace the Unicode characters are correct but when I
look at the HTML source some of the Unicode bytes have been converted to
HTML entities. As a result Unicode characters are not displayed correctly in
a browser. If I change the entities in the HTML back to the proper
characters the page displays correctly.

What am I doing wrong? Should I be using &#nnnn; in the original XML? Or
should Xalan be able to output Unicode but my template is wrong? I have read
the FAQ and archive regarding UTF-8 encoding but can't seem to get it right.

Jim


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread