RE: [xsl] I18N / UTF-8 versus US-ASCII

Subject: RE: [xsl] I18N / UTF-8 versus US-ASCII
From: "Sangal, Amit (STSD)" <amit.sangal@xxxxxx>
Date: Tue, 4 Apr 2006 16:07:17 +0530
I have less knowledge in this area.

In my case also, XML (containing Korean character) need to travel from one
machine(running in Korean/ko_KR locale) to another machine(running in
english/en_US local) inside SOAP envelope.

Is there any known disadvantage/limitation of using US-ASCII output encoding?

Regards,
Amit


-----Original Message-----
From: andrew welch [mailto:andrew.j.welch@xxxxxxxxx]
Sent: Tuesday, April 04, 2006 3:16 PM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [xsl] I18N / UTF-8 versus US-ASCII

On 4/4/06, Sangal, Amit (STSD) <amit.sangal@xxxxxx> wrote:
> Hi,
>
> I'm facing some trouble using the Xalan-j 2.6.0 for transforming XML which
contains Korean characters.
> When I use UTF-8 encoding, it makes these characters into garbled mess, like
G;EM
> <xsl:output method="xml" encoding="UTF-8" indent="yes"/>
> e.g
> <Dependencies>
> <Source>europeG;EM</Source>
> <Target>email_node3</Target>
> </Dependencies>
>
> But when output encoding is changed to US-ASCII, outcome is all right and I
do not see any garbling of Korean characters.
> <xsl:output method="xml" encoding="US-ASCII" indent="yes"/>
> e.g.
> <Dependencies>
> <Source>europe&#54504;&#53552;</Source>
> <Target>email_node3</Target>
> </Dependencies>
> Is it ok to use US-ASCII encoding?

Yes it makes life much easier when encoding is effectively taken out
of the equation.  I always use us-ascii output encoding and leave the
browser render the characters.  It all gets far too painful when your
encoding gets lost as the XML travels through servlets etc and your
glorious multibyte characters become single byte rubbish.

I'm know I'm wrong, and we should all use UTF-8 output encoding, and
ensure everything else is UTF-8 aware, but it's just easier not to.

Current Thread