Re: [xsl] using xsl:message with UTF-8 characters

Subject: Re: [xsl] using xsl:message with UTF-8 characters
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Mon, 23 Apr 2007 15:46:44 +0200
Andrew Welch wrote:

The simple rule is, always read and write using the same encoding, and be aware when something is converting between characters and bytes behind the scenes - servlets for example. Make sure the font you're viewing the result in contains the glyphs for the characters you're trying to view (helpfully the no-glyh character is often the same box or question mark used to mean no-mapping in the encoding...requiring a hex editor to check the underlying bytes), and be certain the viewer is showing the result in the right encoding (the cmd window here, or say the Eclipse output window is another notorious spot)

Indeed. Things tend to get awkward and downright confusing (especially to the unaware) when you have error messages written in encoding X, and displayed in encoding Y, while you have the stdout viewer of your system displaying in encoding A and writing to in encoding B.


With the quite explicit encoding of XML related technologies (not explicit enough to my taste, still: http://www.w3.org/TR/xslt20/#function-unparsed-text, points 1 and 2 before point 3), the future looks better than the past. But having <xsl:output encoding="utf-8" /> and using the Console for your testing (which displays both stderr and stdout) you may run into unpleasant surprises. The rule as you say: "input/output should be all the same and be aware of implicit conversion" seems easy at first sight, but there's so much involved and so much history that it can be quite hard to find out if What You See is REALLY What You Get.

Cheers,
Abel

Current Thread