Subject: Re: [xsl] German character set problem(Umlaute) From: Mike Brown <mike@xxxxxxxx> Date: Thu, 19 Dec 2002 13:16:15 -0700 (MST) |
Andreas Schlegel wrote: [ Charset windows-1252 unsupported, converting... ] > Hi, > > we have the following problem with our internet application. > If the user make an input in a pure HTML form like "müller" the server > (JAVA servlets with Tomcat 4.0.3) get "müller". Not always. The encoding of the HTML document containing the form determines (by convention, not standard) how the form data is escaped and sent in the HTTP request to the servlet. So if your HTML with the form contains <meta http-equiv="Content-Type" content="text/html;charset=utf-8"> and the user hasn't overridden the encoding in their browser, then the form is submitted with data encoded like m%C3%BCller because byte pair C3 BC is how ü is represented in UTF-8. If the form is iso-8859-1 encoded then you get m%FCller, because byte FC is how ü is represented in iso-8859-1. In the request, there's typically no indication of what encoding was used as the basis for the %-escaping, so when converting this data to a String for access in a "parameter" of the request, Tomcat makes a guess, using iso-8859-1, last I checked -- someone correct me if they've changed it. Parameter is a heavily overloaded term; I try not to use it when talking about HTML form data. So as long as your HTML form is iso-8859-1 encoded and the user isn't doing anything unusual, Tomcat tells you that it got a String like "m\u00FCller". > If the user make the input in a HTML form which was generated by the > TransformerFactory of the package javax.xml.transform (j2sdk1.4.0_01) > the server receives the String "mÃ?ller"! Apparently your form is UTF-8 encoded, and the browser knows that, and is sending the data like m%C3%BCller. Tomcat doesn't know about UTF-8 being used, so it thinks C3 and BC are iso-8859-1 bytes that map to separate characters. Either change your transformation to output the HTML form as iso-8859-1, or have your servlet re-encode the String as iso-8859-1 bytes, then decode it back into a String using utf-8. Mike -- Mike J. Brown | http://skew.org/~mike/resume/ Denver, CO, USA | http://skew.org/xml/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] German character set problem(, Andreas Schlegel | Thread | Re: [xsl] German character set prob, Andreas Schlegel |
Re: [xsl] Yet Another Entity Ref qu, Mike Brown | Date | [xsl] using the mozilla xsl renderi, juggy |
Month |