Subject: Re: [xsl] Unicode and child element From: Tony Graham <Tony.Graham@xxxxxxxxxxxxxxxxxxxxxx> Date: Fri, 29 Aug 2008 16:09:44 +0100 |
On Fri, Aug 29 2008 16:01:59 +0100, davidc@xxxxxxxxx wrote: >> So presumably it's okay to use them between consenting pieces of >> software. > > yes but > > either that's true, in which case it is OK for Ken to use them in > XSLT, except that it means his assumption that they never occur in XML > input is not valid, as other people may consent to use these characters > as well. I really meant your own software. Other people shouldn't send you anything with those code points. You are allowed to strip them if you're not expecting to see them. Your average XML software isn't going to do that automagically, so you'd have to set up some sort of filter if you expected that you might see some. You can then put in other instances of a noncharacter (you'd think that Unicode had enough hyphens that they could spare one for this word) in your internal processing, but they should come out before you interchange the XML with anybody else. > or it is not true, in which case Ken's assumption that they do not > appear in input documents is valid, but it means that he can't use them > in XSLT either. You could use them in your own XSLT, but you shouldn't be wanting to interchange them with just anybody. For you to put them in, say, an RSS feed would not be so fine. > So either way I don't think they should be used (even thought they work) > and using private use characters is safer (especially if you wander up > into the higher planes were there will be less legacy usage) Using either is fraught with difficulty and requires all involved to know what the code points mean. If receiving a particular character in your input is going to break your processing -- irrespective of whether it's a private use character or noncharacter -- then you'd take precautions in proportion to how badly it would affect you. If it would cripple your multi-million dollar business, then you'd take different precautions than if it was a one-off XML file for a one-off stylesheet that you were just playing with. For standards, the story is a bit different. According to the Character Model for the World Wide web [1]: C070 [S] Specifications SHOULD NOT arbitrarily exclude code points from the full range of Unicode code points from U+0000 to U+10FFFF inclusive. C079 [S] Specifications SHOULD NOT allow the use of codepoints reserved by Unicode for internal use. C038 [S] Specifications MUST NOT require the use of private use area characters with particular assignments. C039 [S] Specifications MUST NOT require the use of mechanisms for defining agreements of private use code points. C040 [S] [I] Specifications and implementations SHOULD NOT disallow the use of private use code points by private agreement. C079 could be an argument for never using noncharacters, but what about for internal use? And for your content: C073 [C] Publicly interchanged content SHOULD NOT use codepoints in the private use area. So you can send me a stylesheet with private use characters in it, but you shouldn't put the same stylesheet on your public web site. Regards, Tony Graham Tony.Graham@xxxxxxxxxxxxxxxxxxxxxx Director W3C XSL FO SG Invited Expert Menteith Consulting Ltd XML, XSL and XSLT consulting, programming and training Registered Office: 13 Kelly's Bay Beach, Skerries, Co. Dublin, Ireland Registered in Ireland - No. 428599 http://www.menteithconsulting.com -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- xmlroff XSL Formatter http://xmlroff.org xslide Emacs mode http://www.menteith.com/wiki/xslide Unicode: A Primer urn:isbn:0-7645-4625-2 [1] http://www.w3.org/TR/2005/REC-charmod-20050215
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Unicode and child element, David Carlisle | Thread | Re: [xsl] Unicode and child element, David Carlisle |
Re: [xsl] namespace declaration pro, Pablo Sebastian Rodr | Date | Re: [xsl] Unicode and child element, David Carlisle |
Month |