Subject: Re: [xsl] decoding percent-escaped octet sequences|
From: "Imsieke, Gerrit, le-tex" <gerrit.imsieke@xxxxxxxxx>
Date: Sat, 21 May 2011 12:13:32 +0200
On 2011-05-20 17:52, Brandon Ibach wrote:Generally, when you're doing string manipulations inside XSLT/XPath, there really is no such thing as ISO-8859-1, UTF-8 or any other encoding, since the "string" data type in XPath is just a string of Unicode characters. The encoding of the input is used to map the sequence of octets to Unicode characters on the way in and the requested encoding of the output is used to do the reverse on the way out.
Percent-escaping is sort of an exception since it is, really, a form of encoding, but not one that is generally handled automatically by the parser, serializer, etc. So, you may need to decode the percent-escapes, but you shouldn't have to worry about the overall encoding.
If you think your use case requires that you really do need to deal with encodings, please tell us a little more about it, so that we might be able to better suggest a solution. How is this string getting into your transform while still being encoded? ...
The XSLT code reads an XML document containing test cases for HTTP header fields using a variety of encoding styles, some of which are the ones I mentioned (either ISO-8859-1 or UTF-8, percent-escaped).
The goal is to transform the escaped strings from the test cases to XSLT strings (Unicode sequences), essentially implementing the header field parsing in XSLT (yes, this is a proof-of-concept, nothing more).
Best regards, Julian
===========8<------------------------ <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:my="my" xmlns:java-urldecode="java:java.net.URLDecoder" >
<!-- see comment below for ' escaping --> <xsl:variable name='input' as='xs:string*' select="( 'us-ascii''en-us''This%20is%20%2A%2A%2Afun%2A%2A%2A', 'iso-8859-1''en''%A3%20rates', 'UTF-8''''%c2%a3%20and%20%e2%82%ac%20rates' )" />
<my:input> <val>us-ascii'en-us'This%20is%20%2A%2A%2Afun%2A%2A%2A</val> <val>iso-8859-1'en'%A3%20rates</val> <val>UTF-8''%c2%a3%20and%20%e2%82%ac%20rates</val> </my:input>
-- Gerrit Imsieke GeschC$ftsfC<hrer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de
Registergericht / Commercial Register: Amtsgericht Leipzig Registernummer / Registration Number: HRB 24930
GeschC$ftsfC<hrer: Gerrit Imsieke, Svea Jelonek, Thomas Schmidt, Dr. Reinhard VC6ckler