Subject: Re: [xsl] Special Characters in URLs From: Mike Brown <mike@xxxxxxxx> Date: Tue, 19 Jun 2001 10:58:25 -0600 (MDT) |
Eriksson Magnus wrote: > Yes, the URIs are interpreted by the Web Server/Web browser but I need them > to be generated correctly by the XSLT processor -- to comply with the > HTTP-standard (e.g. no white space in URLs). Is there a way to achieve this? Re: the encoding: The encoding of the document as a whole has no bearing on the %-style escaping of characters in a URI. So for example if you have in your stylesheet <xsl:output method="html" encoding="iso-8859-1"> and <a href="http://skew.org/printenv?greeting={greeting}">click</a> and your XML has: <greeting>¡Hola!</greeting> then your output should end up like: <a href="http://skew.org/printenv?greeting=%C3%A1Hola!">click</a> You may have thought that the last 6 characters of that URI reference would be bytes like: ¡ H o l a ! A1 48 6F 6C 61 21 <-- iso-8859-1 bytes because if you just did <xsl:value-of select="greeting"/> that is precisely what you would get. The reason it changes when the XSL processor emits it in an href attribute is because of this clause in the XSLT spec: "The html output method should escape non-ASCII characters in URI attribute values using the method recommended in Section B.2.1 of the HTML 4.0 Recommendation". And that section says to use UTF-8 as the basis for the %-escaping of the URI. This means you likely get this in the output: % C 3 % A 1 H o l a ! 25 43 33 25 41 31 48 6F 6C 61 21 <-- iso-8859-1 bytes, still See, you *did* get iso-8859-1 output like you asked for. The UTF-8-ness is actually at a higher level of abstraction. Note that this escaping happens *only* for non-ASCII characters (U-00000080 and higher). So it does not affect those ASCII characters that are reserved or disallowed in a URI, like " ", among others. Even if the XSLT processor failed to do the UTF-8 based escaping of non-ASCII characters, the HTML user agents are supposed to do it when interpreting the URI reference anyway. Of course your problem is on the server end. Chances are, you are coding using an API that expects iso-8859-1 as the basis for the URL escaping, which is perfectly reasonable to do, especially in light of the fact that browsers tend to send URL-encoded form data with the URL-escaping being based on the actual encoding of the document containing the form (rather, the encoding that the browser is assuming the containing document is using; this is user-overridable). If you make the containing document utf-8 instead of iso-8859-1, you can assume that all the escaping is UTF-8 based, and then you can convert the misinterpreted-as-iso-8859-1 strings you get from the form data API back to iso-8859-1 bytes and then read these bytes back into a string using utf-8 interpretation. Your other option is to avoid putting the raw non-ASCII characters in the URI refs in the first place. If you absolutely must have %A1 for inverted exclamation mark, then the only way to ensure this is to make your stylesheet put %A1 in the result tree. You can do this using an extension function (ideal) or with a clever recursive template. Re: escaping of ASCII characters like " " (space), you must also control this in your stylesheet. If you want "+" or "%20" (the latter is preferable), then have your stylesheet explicitly put that in the result tree. See also: http://skew.org/xml/misc/URI-i18n/ Hope this helps. - Mike _____________________________________________________________________________ mike j. brown, software engineer at | xml/xslt: http://skew.org/xml/ webb.net in denver, colorado, USA | personal: http://hyperreal.org/~mike/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Special Characters in URL, Thomas B. Passin | Thread | Re: [xsl] Special Characters in URL, Michael Beddow |
Re: [xsl] Special Characters in URL, Thomas B. Passin | Date | [xsl]URL for "Things XSLT can't do", Pan, Jenny |
Month |