Re: [xsl] Unicode and child element

Subject: Re: [xsl] Unicode and child element
From: "G. Ken Holman" <gkholman@xxxxxxxxxxxxxxxxxxxx>
Date: Fri, 29 Aug 2008 07:32:52 -0400
Just a quick postscript to my earlier post for users of the archive.

At 2008-08-28 09:05 -0400, I wrote (modified):
I get the impression that you have copied a partial solution to your problem because only one step is missing, yet the result evidence you present seems not to match what the code you have would produce. I'll just focus my answer on your code and not on your evidence.

At 2008-08-28 17:51 +0530, Pankaj Chaturvedi wrote:
<xsl:function name="my:reverse-string">
        <xsl:param name="arg"/>
                <xsl:sequence
select="replace(codepoints-to-string(string-to-codepoints($arg)),
'\[#x([0-9A-Za-z]+)\]', '&#xE000;#x$1;')"/>
</xsl:function>

Okay, the above does *not* put "[#xUUUU]" into the result tree as you contend.


It puts "&#xE000;UUUU;" into the result tree.

I gather you want "&#xUUUU;" in the serialized version of the result tree.

Using &#xE000; is a typical approach to using output character maps in serialization to get file results that are not possible using default serialization.

Perhaps Tony Graham could correct me if I'm wrong here, but I've stopped using &#xE000; in my code and in my examples when presenting this technique to my students because that is a Unicode private character that *might* very well have meaning between two users who are exchanging documents. Which means there might actually be user data in the XML documents that includes such a private character agreed upon between the two parties. XSLT serialization doesn't distinguish the one character in two contexts when found in user data or stylesheet-injected data of the output stream: it would translate all such characters found.


The Unicode characters &#xE000; through &#xF8FF; have the semantic of being a character that the sender and receiver can agree upon can be of any meaning.

The Unicode characters &#xFDD0; through &#xFDEF; are specifically "non-characters", which means they must not be used to represent characters in a data stream between sender and receiver. This means that two trading partners must not use them in XML documents, which makes them available for XSLT users for this character mapping technique without interfering with user data.

I agree using the private characters presents only an infinitesimally small chance of corrupting user data since I'm not aware of people actually using private characters for interchange. Nevertheless using non-characters for this XSLT stylesheet character mapping seems to me to be better guidance than using private-use characters.

I hope this is considered helpful.

. . . . . . . . . . . . Ken

--
Upcoming XSLT/XSL-FO hands-on courses:      Wellington, NZ 2009-01
Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video
G. Ken Holman                 mailto:gkholman@xxxxxxxxxxxxxxxxxxxx
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/s/
Male Cancer Awareness Nov'07  http://www.CraneSoftwrights.com/s/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal

Current Thread