Re: [xsl] Global parameters with UTF-8 characters and ???s <Disregard Previous>

Subject: Re: [xsl] Global parameters with UTF-8 characters and ???s <Disregard Previous>
From: "andrew welch" <andrew.j.welch@xxxxxxxxx>
Date: Thu, 3 Aug 2006 10:32:07 +0100
On 8/3/06, andrew welch <andrew.j.welch@xxxxxxxxx> wrote:
On 8/2/06, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> > Is this the right solution?  Or does it just point out what
> > the issue is?
>
> It's a viable workaround. But it suggests that there is some kind of
> configuration problem somewhere, perhaps with the web server.

It effectively takes encoding out of the equation, the ascii
characters & #nnn; are written to disk instead of a single unicode
character, and the browser reads ascii instead of the single unicode
character.

If you can see the correct characters in the browser now then it
suggests they are contained in the font that's being used, and the
problem lies with the file being written in one encoding and read in
another.  When the encoding doesn't contain a mapping for a given byte
sequence a question mark ? is used to mean "no mapping".

If you use a hex editor at every stage of the process to find out when
the bytes for the character ? are x3F (meaning the ? really is a ? and
its not just your viewer) then you'll know that the last stage was the
culprit.

If you are using Java then it's often the case of the setting default
platform encoding to UTF-8:

System.setProperty("file.encoding", "UTF-8"))

This ensures any operations that involve encodings (where an optional
encoding agument hasn't been specified, eg getBytes()) will use UTF-8.
 If you don't specify this then ISO-8859-1 is used (on Windows
platforms anyway, afaik).

I've just noticed in your other post that you are using JSPs, in which case also ensure you set the pageEncoding in the page directive:

<%@ page pageEncoding="UTF-8" .....

This is the encoding used when the JSP is converted into a servlet...

cheers
andrew

Current Thread