Re: [xsl] Character 150 withs Windows-1252 output

Subject: Re: [xsl] Character 150 withs Windows-1252 output
From: "andrew welch" <andrew.j.welch@xxxxxxxxx>
Date: Fri, 21 Apr 2006 13:56:13 +0100
On 4/21/06, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> > Why is it that #150 gets escaped when using Windows-1252
> > output encoding when it should contain that character?
>
> Because there is no character in the Windows-1252 character set that
> corresponds to the Unicode character with codepoint 150.

Yes, thanks.  That makes sense now.  The thing I'm struggling with now is
this:

This source XML:

<?xml version="1.0" encoding="Windows-1252" ?>
<foo>&#150;</foo>

With this stylesheet:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
<xsl:output encoding="US-ASCII"/>
<xsl:template match="/">
  <xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>

Gives this result:

<foo>&#150;&#8211;</foo>

I've checked the input file with a hex editor to make sure the
un-escaped dash really is 0x96.  Somehow the two characters are
treated differently, which is something I didn't expect.

I think that 0x96 in the input XML read using Windows-1252 should
become #8211 when output using any encoding other than Windows-1252,
which is what is happening for the actual character 0x96, but the
character reference #150 gets serialised back as #150...

Any thoughts?

Current Thread