Re: [xsl] 16-bit entities converted to "?" by XSLT

Subject: Re: [xsl] 16-bit entities converted to "?" by XSLT
From: "G. Ken Holman" <gkholman@xxxxxxxxxxxxxxxxxxxx>
Date: Sat, 06 Dec 2008 10:32:07 -0500
At 2008-12-06 15:31 +0000, you wrote:
I have a pass-through rule in my XSLT stylesheet to allow me to copy
embedded HTML unchanged to the output:

  <xsl:template match='div|span'>
    <xsl:copy-of select="."/>

However, when I have 16-bit entities in the HTML, they are translated
to question marks in the output. For example, the following contains
some Hebrew characters:

  <span style='font-size: 11pt; font-family: Arial;' lang='HE'>

My browser shows this correctly, but when I embed it in some XML and
run it though my stylesheet, the output is this:

  <span style='font-size: 11pt; font-family: Arial;' lang='HE'>

That doesn't sound like conformant XSLT to me ... which XSLT processor are you using and which XML processor is being used by it (or overridden by you)?

My training material has Hebrew and Arabic characters in source files showing the nuances of bidirectional text, and I've had no problems with Saxon.

I've tried <xsl:output encoding="UTF-16"> and various other things, but
nothing seems to work. Is there an easy way to fix this so I can just
display 16-bit characters?

I'm guessing it is a problem with the building of the source tree and not with the serialization of the result tree.

You could try experimenting with those entities in your stylesheet, but odds are the same XML processor will load the stylesheet tree in the same erroneous fashion as the source tree.

I hope this helps.

. . . . . . . . . . . . Ken

Upcoming XSLT/XSL-FO, UBL and code list hands-on training classes:
:  Sydney, AU 2009-01/02; Brussels, BE 2009-03; Prague, CZ 2009-03
Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video
Video sample lesson:
Video course overview:
G. Ken Holman                 mailto:gkholman@xxxxxxxxxxxxxxxxxxxx
Crane Softwrights Ltd.
Male Cancer Awareness Nov'07
Legal business disclaimers:

Current Thread