Subject: Re: [xsl] Character 150 withs Windows-1252 output From: "andrew welch" <andrew.j.welch@xxxxxxxxx> Date: Fri, 21 Apr 2006 14:21:48 +0100 |
> > Gives this result: > > > > <foo>––</foo> > > > > I've checked the input file with a hex editor to make sure the > > un-escaped dash really is 0x96. Somehow the two characters are > > treated differently, which is something I didn't expect. > > > > I think that 0x96 in the input XML read using Windows-1252 should > > become #8211 when output using any encoding other than Windows-1252, > > which is what is happening for the actual character 0x96, but the > > character reference #150 gets serialised back as #150... > > Isn't this beause – is a unicode entity? It's not a windows-1252 > entity. In other words a character entity never changes according to > the input encoding. Ahh of course, that makes sense. The character for #150 is worked out after the bytes in the document have be parsed using the encoding specified in the prolog.... So 0x96 becomes #8211 though the mapping defined in Windows-1252, and #150 remains as #150 because its a character reference and character references are always unicode. Thanks Nic!
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Character 150 withs Windo, Nic | Thread | [xsl] Regular expression /s whitesp, Karen McAdams |
[xsl] SV: xsl-list Digest 21 Apr 20, Lisa.Bergqvist | Date | RE: [xsl] Re: Character 150 withs W, Michael Kay |
Month |