Re: [xsl] Preserving numeric character entity reference

Subject: Re: [xsl] Preserving numeric character entity reference
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Wed, 14 Feb 2007 22:06:53 +0100
Kjetil Kjernsmo wrote:
Ah, right. And it is even recommendation now... :-) I use libxslt 1.1.17 with mod_xslt, so whatever capabilities they have. I assume that means XSLT 1.

Ah, that's a pity. And from the remainder of your post, I understand that you want to translate these values hence and forth, using XSLT, right? This makes it very hard and an uncommon task to do with XSLT (so it may seem trivial at first sight). Mind you, the people behind the standards have learned a great deal from the problems of XSLT 1 and these, plus many other impossibilities are now part of the language in XSLT 2.


If there is no way that I can talk you into using XSLT 2 (i.e., by using the cross-platform java-based processor Saxon 8, which is free and also available as .NET version), then you are stuck with either a monstrosity of a template (ok, only about 92 or so characters need translation, it can be done) or with a simple enough extension function.

I am not sure if libxslt supports EXSLT and to what extend, but EXSLT has a replace() function, which, if supported, should come to the rescue. Please look around at http://www.exslt.org and test some functions with your processor (don't trust the support-statements on their site, they are out of date).

Then I put a CDATA around just the character entities. And tried the &amp;, but how would I then go about to translate it back?

See above. This is the hard part in XSLT 1, because a simple replace function for strings is not available, let alone a codepoints-to-string function, which would be the preferred way here. In XSLT 2 it looks like this for any string containing escapes of the literal form &#123; (and in XML look like: &amp;123;):


<xsl:value-of select="replace($string, '&amp;#(\d);', codepoints-to-string(number('$1')))" />

which is very far from what it will look like in plain XSLT 1: with recursive templates and some kind of lookup table.

Or should I go with a template approach, where I go into the XHTML fragment and do an value-of with disable output escaping?

disable-output-escaping is considered, among most people, as evil. Don't go that path. What is more, processors do not have to abide the d-o-e (especially when the result is non-conforming XML), so if you can do without it, stay without it.


One more thing. If you do decide to go the XSLT 2 way, then all you need to do is the obfuscating using xsl:character-maps. The XML you create is still legal XML (stay away from illegal entities!) and when read back in, all values will look as there normal strings (which is why this kind of obfuscation might not scare off people that much: it is too easy to read by any XML parser).

Good luck with your efforts,

Cheers,
-- Abel Braaksma

Current Thread