Subject: Re: [xsl] Unicode and child element From: Tony Graham <Tony.Graham@xxxxxxxxxxxxxxxxxxxxxx> Date: Fri, 29 Aug 2008 13:49:49 +0100 |
On Fri, Aug 29 2008 13:06:32 +0100, davidc@xxxxxxxxx wrote: > Ken, > >> The Unicode characters  through  are specifically >> "non-characters", which means they must not be used to represent Everybody's understanding of these code points has been changing over the years. They are not singled out in either the Unicode Standard, Version 2.0, or the Unicode Standard, Version 3.0 as being blessed (or damned) as never having a character assigned to them. I don't have time to trace when they became special, but they are so mentioned in the Unicode Standard, Version 5.0, and in the draft XML 1.0 Fifth Edition [1]. >> characters in a data stream between sender and receiver. This means >> that two trading partners must not use them in XML documents, which >> makes them available for XSLT users for this character mapping >> technique without interfering with user data. > > I'm not sure I see it that way, these non characters are not actually > banned by XML systems so (like private use characters) their use (or non > use) is constrained by convention rather than technology. > > Given that XSLT files are XML, it would see that this convention would > suggest that they not be used here as well. If you say that the > convention will ensure that this character definitely won't appear in an > XML source file, what happens if someone uses the XSLT document as > input? Naturally the use of such an unconventional feature would be thoroughly commented in the XSLT document! If you don't understand a noncharacter, you can delete it: If a noncharacter that does not have a specific internal use is unexpectedly encountered in processing, an implementation may signal an error or delete or ignore the noncharacter. If these options are not taken, the noncharacter should be treated as an unassigned code point. For example, an API that returned a character property value for a noncharacter would return the same value as the default value for an unassigned code point. [2] I'd expect most (all, really) XSLT processors to handle the noncharacters, since as you point out, they are allowed in XML (even if they could be frowned upon in future). >> I'm not aware of people actually using private characters for >> interchange > > We (in MathML 1) got our figures severely wrapped with a bow of a ship? > for using private use > characters for math (because the math characters were not added until > Unicode 3.1 and 3.2, many years later) on the grounds that specifying > "private" uses for private use characters would greatly hamper usage of > the standard in Asia, particularly, where apparently a certain well > known operating system used lage chunks of the private use area for > commonly used document code pages..... And a large font company had corralled a different chunk. Perhaps you should have volunteered for John Cowan's ConScript registry at the time. >> Nevertheless using non-characters for this XSLT stylesheet character >> mapping seems to me to be better guidance than using private-use >> characters. > > But could be construed as breaking the convention that says that > non-characters not be used in documents. Unicode discourages their use in interchange, which is not quite the same as never using them, though somehow "interchange" isn't defined in the Unicode glossary. XML 1.0 Fifth Edition goes/will go only so far as to say they are "discouraged". So presumably it's okay to use them between consenting pieces of software. Regards, Tony Graham Tony.Graham@xxxxxxxxxxxxxxxxxxxxxx Director W3C XSL FO SG Invited Expert Menteith Consulting Ltd XML, XSL and XSLT consulting, programming and training Registered Office: 13 Kelly's Bay Beach, Skerries, Co. Dublin, Ireland Registered in Ireland - No. 428599 http://www.menteithconsulting.com -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- xmlroff XSL Formatter http://xmlroff.org xslide Emacs mode http://www.menteith.com/wiki/xslide Unicode: A Primer urn:isbn:0-7645-4625-2 [1] http://www.w3.org/TR/2008/PER-xml-20080205/ [2] http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf from http://www.unicode.org/versions/Unicode5.1.0/
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Unicode and child element, David Carlisle | Thread | Re: [xsl] Unicode and child element, David Carlisle |
Re: [xsl] Unicode and child element, Tony Graham | Date | Re: [xsl] Unicode and child element, David Carlisle |
Month |