Subject: Re: [xsl] Global parameters with UTF-8 characters and ???s <Disregard Previous>|
From: "andrew welch" <andrew.j.welch@xxxxxxxxx>
Date: Thu, 3 Aug 2006 10:32:07 +0100
On 8/2/06, Michael Kay <mike@xxxxxxxxxxxx> wrote: > > Is this the right solution? Or does it just point out what > > the issue is? > > It's a viable workaround. But it suggests that there is some kind of > configuration problem somewhere, perhaps with the web server.
It effectively takes encoding out of the equation, the ascii characters & #nnn; are written to disk instead of a single unicode character, and the browser reads ascii instead of the single unicode character.
If you can see the correct characters in the browser now then it suggests they are contained in the font that's being used, and the problem lies with the file being written in one encoding and read in another. When the encoding doesn't contain a mapping for a given byte sequence a question mark ? is used to mean "no mapping".
If you use a hex editor at every stage of the process to find out when the bytes for the character ? are x3F (meaning the ? really is a ? and its not just your viewer) then you'll know that the last stage was the culprit.
If you are using Java then it's often the case of the setting default platform encoding to UTF-8:
This ensures any operations that involve encodings (where an optional encoding agument hasn't been specified, eg getBytes()) will use UTF-8. If you don't specify this then ISO-8859-1 is used (on Windows platforms anyway, afaik).
I've just noticed in your other post that you are using JSPs, in which case also ensure you set the pageEncoding in the page directive: