Re: [xsl] XSL and international characters

Subject: Re: [xsl] XSL and international characters
From: Mike Brown <mike@xxxxxxxx>
Date: Tue, 4 Dec 2001 13:33:40 -0700 (MST)
Marcin K_os wrote:
> Well, I agree that those are characters are in UTF-8 and that I wanted 
> characters in UTF, the problem is that I passed as parameter one two-bye 
> character and each byte of those two was transformed again into two-byte 
> characters giving in result four bytes i.e., two two-byte characters.
> 
> Orginal character was %C5%82 and the result was &Aring; - one character and 
> &#130; - second character :(

Everyone else seems to have missed your point. You are running into
an issue with an underspecified part of the URI, HTTP and HTML specs:
there is no standard mechanism for declaring what encoding is being
used when representing non-ASCII characters (x80 and above) in the
%-escaped format used in URIs and HTML form data submissions.

Tomcat interprets %C5%A2 in the HTTP request as bytes C5 A2, and
exposes them through the Java/JSP API as 2 chars in a String
according to an assumed (and probably wrong) iso-8859-1 encoding.

On the receiving end, you must convert these chars back into bytes,
assuming iso-8859-1, and then convert them to a String again, this
time assuming UTF-8. I did this in JSPs with WebLogic a while back,
and it was pretty straightforward. I'm not sure how it works with
your particular Tomcat/Cocoon setup, though.

   - Mike
____________________________________________________________________________
  mike j. brown, fourthought.com  |  xml/xslt: http://skew.org/xml/
  denver/boulder, colorado, usa   |  personal: http://hyperreal.org/~mike/

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread