Re: [xsl] codepoints-to-string and string-to-codepoints only support Unicode. Why is that?

Subject: Re: [xsl] codepoints-to-string and string-to-codepoints only support Unicode. Why is that?
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Mon, 12 Feb 2007 16:03:14 +0100
Michael Kay wrote:
Apart from these questions, is anybody aware of a resolution to my problem? Most likely I need an extension function, am I right?

Yes indeed. It was never a design intention that the core function library should do everything that anyone might conceivably want.

Not even an general purpose language is capable of solving that demand, and you can't always make a car do the tango. Though there is quite extensive support for encodings in XSLT, which is why I wonder why 'related' functions do not have this same support level. Of course, 'related' is a relative term and I consider the codepoints-to-string functions 'related' to encodings ;-)


I can see three solutions to find a generalized treatment of what I call "encoded codepoints":

1. Create translation tables for each encoding we wish to support.
2. Treating the codepoints as Unicode, serializing them to a non-XML format as ISO-8859-1, and reading them back in ISO-8859-X
3. Using an extension function that does about the same as (2).


(2) and (3) avoid having to write translation tables, which I am very reluctant to do, as the host language (here: Java, with Saxon) already has this available. So, (1) seems pretty awkward, though it would likely be a pretty stable way.

(2) sounds rather clumsy and risky, because of the extra parsing invocation that is necessary to serialize and read back in, which I fear might introduce unwanted side effects and adds much to the complexity of the current XSLT design we have.

(3) may be my best option, but moves away from portability, which until now I have pretty much been able to avoid.

Perhaps someone has gone this path already and likes to share his visions?

Thanks,

-- Abel

Current Thread