Re: [xsl] I18N / UTF-8 versus US-ASCII

Subject: Re: [xsl] I18N / UTF-8 versus US-ASCII
From: "andrew welch" <andrew.j.welch@xxxxxxxxx>
Date: Tue, 4 Apr 2006 10:46:00 +0100
On 4/4/06, Sangal, Amit (STSD) <amit.sangal@xxxxxx> wrote:
> Hi,
>
> I'm facing some trouble using the Xalan-j 2.6.0 for transforming XML which
contains Korean characters.
> When I use UTF-8 encoding, it makes these characters into garbled mess, like
G;EM
> <xsl:output method="xml" encoding="UTF-8" indent="yes"/>
> e.g
> <Dependencies>
> <Source>europeG;EM</Source>
> <Target>email_node3</Target>
> </Dependencies>
>
> But when output encoding is changed to US-ASCII, outcome is all right and I
do not see any garbling of Korean characters.
> <xsl:output method="xml" encoding="US-ASCII" indent="yes"/>
> e.g.
> <Dependencies>
> <Source>europe&#54504;&#53552;</Source>
> <Target>email_node3</Target>
> </Dependencies>
> Is it ok to use US-ASCII encoding?

Yes it makes life much easier when encoding is effectively taken out
of the equation.  I always use us-ascii output encoding and leave the
browser render the characters.  It all gets far too painful when your
encoding gets lost as the XML travels through servlets etc and your
glorious multibyte characters become single byte rubbish.

I'm know I'm wrong, and we should all use UTF-8 output encoding, and
ensure everything else is UTF-8 aware, but it's just easier not to.

Current Thread