Subject: Re: [xsl] Character 150 withs Windows-1252 output From: Nic <nferrier@xxxxxxxxxxxxxxxxxxxx> Date: Fri, 21 Apr 2006 14:10:56 +0100 |
"andrew welch" <andrew.j.welch@xxxxxxxxx> writes: > On 4/21/06, Michael Kay <mike@xxxxxxxxxxxx> wrote: >> > Why is it that #150 gets escaped when using Windows-1252 >> > output encoding when it should contain that character? >> >> Because there is no character in the Windows-1252 character set that >> corresponds to the Unicode character with codepoint 150. > > Yes, thanks. That makes sense now. The thing I'm struggling with now is this: > > This source XML: > > <?xml version="1.0" encoding="Windows-1252" ?> > <foo>–b</foo> > > With this stylesheet: > > <xsl:stylesheet version="1.0" > xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> > <xsl:output encoding="US-ASCII"/> > <xsl:template match="/"> > <xsl:copy-of select="."/> > </xsl:template> > </xsl:stylesheet> > > Gives this result: > > <foo>––</foo> > > I've checked the input file with a hex editor to make sure the > un-escaped dash really is 0x96. Somehow the two characters are > treated differently, which is something I didn't expect. > > I think that 0x96 in the input XML read using Windows-1252 should > become #8211 when output using any encoding other than Windows-1252, > which is what is happening for the actual character 0x96, but the > character reference #150 gets serialised back as #150... Isn't this beause – is a unicode entity? It's not a windows-1252 entity. In other words a character entity never changes according to the input encoding. Nic Ferrier
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Character 150 withs Windo, andrew welch | Thread | Re: [xsl] Character 150 withs Windo, andrew welch |
Re: [xsl] Character 150 withs Windo, andrew welch | Date | Re: [xsl] lookup table problem, G. Ken Holman |
Month |