Re: [xsl] Character 150 withs Windows-1252 output

Subject: Re: [xsl] Character 150 withs Windows-1252 output
From: Nic <nferrier@xxxxxxxxxxxxxxxxxxxx>
Date: Fri, 21 Apr 2006 14:10:56 +0100
"andrew welch" <andrew.j.welch@xxxxxxxxx> writes:

> On 4/21/06, Michael Kay <mike@xxxxxxxxxxxx> wrote:
>> > Why is it that #150 gets escaped when using Windows-1252
>> > output encoding when it should contain that character?
>>
>> Because there is no character in the Windows-1252 character set that
>> corresponds to the Unicode character with codepoint 150.
>
> Yes, thanks.  That makes sense now.  The thing I'm struggling with now is
this:
>
> This source XML:
>
> <?xml version="1.0" encoding="Windows-1252" ?>
> <foo>&#150;b</foo>
>
> With this stylesheet:
>
> <xsl:stylesheet version="1.0"
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
> <xsl:output encoding="US-ASCII"/>
> <xsl:template match="/">
>   <xsl:copy-of select="."/>
> </xsl:template>
> </xsl:stylesheet>
>
> Gives this result:
>
> <foo>&#150;&#8211;</foo>
>
> I've checked the input file with a hex editor to make sure the
> un-escaped dash really is 0x96.  Somehow the two characters are
> treated differently, which is something I didn't expect.
>
> I think that 0x96 in the input XML read using Windows-1252 should
> become #8211 when output using any encoding other than Windows-1252,
> which is what is happening for the actual character 0x96, but the
> character reference #150 gets serialised back as #150...

Isn't this beause &#150; is a unicode entity? It's not a windows-1252
entity. In other words a character entity never changes according to
the input encoding.


Nic Ferrier

Current Thread