RE: [xsl] using xsl:message with UTF-8 characters

Subject: RE: [xsl] using xsl:message with UTF-8 characters
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Mon, 23 Apr 2007 12:52:02 +0100
> However, this won't solve your problem with xsl:message 
> (sorry), because Saxon seems to emit the messages of 
> xsl:message and fn:trace using
> Latin-1 encoding or similar (I believe it were nicer if Saxon 
> would output in UTF-8, but maybe this is Sun Java's problem, 
> not Saxon's). 

I've done some further investigation, and it seems that Saxon 8.9 isn't
actually working as designed here (it's fixed in my current development
build, quite by chance, which confused matters). Java makes the decision
what encoding to use for the output, and in my tests it is deciding to use
CP1252 when running in my IDE (IntelliJ). Saxon should then find out from
Java what encoding is being used, and replace all characters outside that
encoding by XML character references. But in 8.9.0.3 the escaping of
characters outside the character set supported by the output writer isn't
happening: I will fix this. My example was poorly chosen because the three
characters &#xaa;&#xba;&#x20ac; can all be represented in CP1252. 

I don't know how good Java is at getting the encoding right, for example
whether it will use a different encoding if you use configuration options
such as "cmd /u" identified by Abel. I'll do some experiments.

In Saxon, xsl:message by default uses a Java Writer, whereas "normal" result
documents use a Java OutputStream. This means that xsl:message output is
sensitive to Java's decisions about the encoding of the output, whereas with
normal result documents the encoding is determined entirely by the
xsl:output serialization properties. If you write your own MessageEmitter
you can write to an OutputStream if you prefer, or to any other destination,
such as an Apache log4j logging service.

Michael Kay
http://www.saxonica.com/ 

Current Thread