Re: [xsl] Unicode and child element

Subject: Re: [xsl] Unicode and child element
From: Tony Graham <Tony.Graham@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 29 Aug 2008 13:49:49 +0100
On Fri, Aug 29 2008 13:06:32 +0100, davidc@xxxxxxxxx wrote:

> Ken,
>
>> The Unicode characters &#xFDD0; through &#xFDEF; are specifically 
>> "non-characters", which means they must not be used to represent 

Everybody's understanding of these code points has been changing over
the years.  They are not singled out in either the Unicode Standard,
Version 2.0, or the Unicode Standard, Version 3.0 as being blessed (or
damned) as never having a character assigned to them.  I don't have time
to trace when they became special, but they are so mentioned in the
Unicode Standard, Version 5.0, and in the draft XML 1.0 Fifth Edition
[1].

>> characters in a data stream between sender and receiver.  This means 
>> that two trading partners must not use them in XML documents, which 
>> makes them available for XSLT users for this character mapping 
>> technique without interfering with user data.
>
> I'm not sure I see it that way, these non characters are not actually
> banned by XML systems so (like private use characters) their use (or non
> use) is constrained by convention rather than technology.
>
> Given that XSLT files  are XML, it would see that this convention would
> suggest that they not be used here as well. If you say that the
> convention will ensure that this character definitely won't appear in an
> XML source file, what happens if someone uses the XSLT document as
> input?

Naturally the use of such an unconventional feature would be thoroughly
commented in the XSLT document!

If you don't understand a noncharacter, you can delete it:

   If a noncharacter that does not have a specific internal use is
   unexpectedly encountered in processing, an implementation may signal
   an error or delete or ignore the noncharacter. If these options are
   not taken, the noncharacter should be treated as an unassigned code
   point. For example, an API that returned a character property value
   for a noncharacter would return the same value as the default value
   for an unassigned code point. [2]

I'd expect most (all, really) XSLT processors to handle the
noncharacters, since as you point out, they are allowed in XML (even if
they could be frowned upon in future).

>> I'm not aware of people actually using private characters for
>> interchange
>
> We (in MathML 1) got our figures severely wrapped

with a bow of a ship?

> for using private use
> characters for math (because the math characters were not added until
> Unicode 3.1 and 3.2, many years later) on the grounds that specifying
> "private" uses for private use characters would greatly hamper usage of
> the standard in Asia, particularly, where apparently a certain well
> known operating system used lage chunks of the private use area for
> commonly used document code pages.....

And a large font company had corralled a different chunk.

Perhaps you should have volunteered for John Cowan's ConScript registry
at the time.

>> Nevertheless using non-characters for this XSLT stylesheet character
>> mapping seems  to me to be better guidance than using private-use
>> characters. 
>
> But could be construed as breaking the convention that says that
> non-characters not be used in documents.

Unicode discourages their use in interchange, which is not quite the
same as never using them, though somehow "interchange" isn't defined in
the Unicode glossary.

XML 1.0 Fifth Edition goes/will go only so far as to say they are
"discouraged".

So presumably it's okay to use them between consenting pieces of
software.

Regards,


Tony Graham                         Tony.Graham@xxxxxxxxxxxxxxxxxxxxxx
Director                                  W3C XSL FO SG Invited Expert
Menteith Consulting Ltd
XML, XSL and XSLT consulting, programming and training
Registered Office: 13 Kelly's Bay Beach, Skerries, Co. Dublin, Ireland
Registered in Ireland - No. 428599   http://www.menteithconsulting.com
  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --
xmlroff XSL Formatter                               http://xmlroff.org
xslide Emacs mode                  http://www.menteith.com/wiki/xslide
Unicode: A Primer                               urn:isbn:0-7645-4625-2


[1] http://www.w3.org/TR/2008/PER-xml-20080205/
[2] http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf
from http://www.unicode.org/versions/Unicode5.1.0/

Current Thread