Re: [xsl] using xsl:message with UTF-8 characters

Subject: Re: [xsl] using xsl:message with UTF-8 characters
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Mon, 23 Apr 2007 14:45:05 +0200
Andrew Welch wrote:
On 4/23/07, Michael Kay <mike@xxxxxxxxxxxx> wrote:

I don't know how good Java is at getting the encoding right, for example
whether it will use a different encoding if you use configuration options
such as "cmd /u" identified by Abel. I'll do some experiments.

As far as I know, Java will use the "platform default encoding" unless told otherwise, which on a Windows machine is CP1252.

If you want to set the default encoding to something else from the
command line, then you can use the "file.encoding" system property,
eg:

java -Dfile.encoding=UTF-8 -jar saxon8.jar .....


Aha, this really helps! The output already improves by a landslide if you choose the file.encoding to be "IBM437" in a default command prompt, which correctly represents the xAA and xBA from the string "&#xaa;&#xba;&#x20ac;" of MK's example. It does not represent the x20AC (euro sign, not available in IBM437), but at least shows it as a question mark instead of like garbage.


Now, to get Unicode output working in a console, take my four steps of my previous mail and add a fifth step with -Dfile.encoding="UTF8" (or UTF-8). If you also want this to work with stdout, make sure the xsl:output encoding matches UTF-8 also (or omit it). Note, still, that you can't use this approach with a batch file, any command in the batch file after 'chcp 65001' won't work.

In addition, this solves the problem of redirection and piping.

Interestingly, the console still messes up things. When the output string is supposed to be (see my PS, where I pasted the stylesheet I used, partially based on MK's string)

<t>V: B*B:b,D7</t> (not sure the mailer keeps it intact)

the console instead outputs:

<t>V: B*B:b,D7</t></t>>

I have no idea why, other strings get similarly corrupted, but prove to be correct when redirected to a file....

Cheers,
-- Abel Braaksma


PS:


<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
version = "2.0"
xmlns:xsl = "http://www.w3.org/1999/XSL/Transform"; >

<xsl:output omit-xml-declaration="yes" encoding="UTF8"/>

<xsl:template name="main">
<xsl:variable name="t">
<t>V: &#xaa;&#xba;&#x20ac;&#x0137;</t>
</xsl:variable>
<xsl:copy-of select="$t" />
<xsl:message select="$t" />
</xsl:template>

</xsl:stylesheet>

Current Thread