Re: [xsl] using xsl:message with UTF-8 characters

Subject: Re: [xsl] using xsl:message with UTF-8 characters
From: "Andrew Welch" <andrew.j.welch@xxxxxxxxx>
Date: Mon, 23 Apr 2007 13:19:35 +0100
On 4/23/07, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> However, this won't solve your problem with xsl:message
> (sorry), because Saxon seems to emit the messages of
> xsl:message and fn:trace using
> Latin-1 encoding or similar (I believe it were nicer if Saxon
> would output in UTF-8, but maybe this is Sun Java's problem,
> not Saxon's).

I've done some further investigation, and it seems that Saxon 8.9 isn't
actually working as designed here (it's fixed in my current development
build, quite by chance, which confused matters). Java makes the decision
what encoding to use for the output, and in my tests it is deciding to use
CP1252 when running in my IDE (IntelliJ). Saxon should then find out from
Java what encoding is being used, and replace all characters outside that
encoding by XML character references. But in 8.9.0.3 the escaping of
characters outside the character set supported by the output writer isn't
happening: I will fix this. My example was poorly chosen because the three
characters &#xaa;&#xba;&#x20ac; can all be represented in CP1252.

I don't know how good Java is at getting the encoding right, for example
whether it will use a different encoding if you use configuration options
such as "cmd /u" identified by Abel. I'll do some experiments.

As far as I know, Java will use the "platform default encoding" unless told otherwise, which on a Windows machine is CP1252.

If you want to set the default encoding to something else from the
command line, then you can use the "file.encoding" system property,
eg:

java -Dfile.encoding=UTF-8 -jar saxon8.jar .....

cheers
andrew

Current Thread