Re: [xsl] decoding percent-escaped octet sequences

On Fri, May 20, 2011 at 12:14 PM, Julian Reschke <julian.reschke@xxxxxx>
wrote:
> On 2011-05-20 17:52, Brandon Ibach wrote:
>> Generally, when you're doing string manipulations inside XSLT/XPath,
>> there really is no such thing as ISO-8859-1, UTF-8 or any other
>> encoding, since the "string" data type in XPath is just a string of
>> Unicode characters.

But Julian is right that a percent-encoded string, which represents a
byte sequence, can be considered to be encoded in one or another way.
I investigated this same kind of thing for the site I work on, and I
don't have a solution for how to convert these to strings inside XSLT,
but I thought I'd just paste some of the test cases I worked with, in
case they might prove interesting or useful.

1. UTF-8 encoded single character
A. ?term=%C3%84rzteblatt
"Crzteblatt"

2. Invalid character codes (ASCII control character(s), but not valid
ISO-8859-1 or UTF-8)
A. ?term=%02%03cat

3. Non UTF-8, ISO-8859-1, single character
A. ?term=%C4rzteblatt
"Crzteblatt"

4. Invalid byte sequence (not valid utf-8 or iso-8859-1)
A. ?term=%C4%83%C4cat

5. Chinese characters, UTF-8 encoded
A. ?term=%e4%bd%a0%e5%a5%bd
Search box: "d= e%="

6. ISO-8859-1 multi-byte - this sequence starts out looking like UTF-8, but
it's not.
A. ?term=%c4%A0%c4rzteblatt
Search box: "C Crzteblatt"

After working with this for a while, we reached the conclusion that
it's best to try to strictly enforce the rule that percent-encoding in
URLs be UTF-8.  In other words, I think it's a bad idea to try to
continue to maintain ISO-8859-1 encoded URLs, because it just leads to
too many possible problems, that are very hard to debug.

Current Thread
[xsl] decoding percent-escaped octet sequences Julian Reschke - 20 May 2011 15:34:52 -0000 Brandon Ibach - 20 May 2011 15:52:31 -0000 Julian Reschke - 20 May 2011 16:14:47 -0000 Chris Maloney - 20 May 2011 17:22:19 -0000 <= Imsieke, Gerrit, le-tex - 21 May 2011 10:14:06 -0000 Imsieke, Gerrit, le-tex - 21 May 2011 10:20:23 -0000 Julian Reschke - 21 May 2011 17:35:37 -0000 Julian Reschke - 26 May 2011 08:56:22 -0000

Current Thread

[xsl] decoding percent-escaped octet sequences
- Julian Reschke - 20 May 2011 15:34:52 -0000
  - Brandon Ibach - 20 May 2011 15:52:31 -0000
    - Julian Reschke - 20 May 2011 16:14:47 -0000
      - Chris Maloney - 20 May 2011 17:22:19 -0000 <=
      - Imsieke, Gerrit, le-tex - 21 May 2011 10:14:06 -0000
      - Imsieke, Gerrit, le-tex - 21 May 2011 10:20:23 -0000
      - Julian Reschke - 21 May 2011 17:35:37 -0000
  - Julian Reschke - 26 May 2011 08:56:22 -0000

<- Previous	Index	Next ->
Re: [xsl] decoding percent-escaped , Julian Reschke	Thread	Re: [xsl] decoding percent-escaped , Imsieke, Gerrit, le-
Re: [xsl] Do you have a rock-solid , David Carlisle	Date	Re: [xsl] decoding percent-escaped , Imsieke, Gerrit, le-
	Month

<-prev [Thread] next->	<-prev [Date] next->
Month Index \| List Home