Subject: Re: [xsl] Output: XML to XML scrambling unicode characters From: David Carlisle <davidc@xxxxxxxxx> Date: Mon, 4 Mar 2002 22:15:40 GMT |
> If I use these symbols, I must add "&" before and ";" after. It was > my assumption that "& #233;" was not any different than these. This > is the reason why I called "& #233; a utf-8 rendering of "e acute". No, this is not so. You can access a space by " " and a tab by " " doing 	 is just an XML reference to these characters but the character data after the XML parse is the same. Actually in the case of white space the rules are a bit different as white space normalisation can affect end cases but for a non white space character like e acute then if used in character data you never need to se a character reference if it is in the encoding. > is the reason why I called "& #233; a utf-8 rendering of "e acute". Doing so leads to confusion though. text encodings relate to the text stream and do not relate to XML syntax at all. So for example latin1 (iso-8859-1) is an encoding in which every character takes up at most one byte, and some positions are unencoded so there's just over a couple of hundred characters available. Enough for western Europe, mostly. If you have a plain text latin1 file you are restricted to just using those characters, and if you want to write say, Polish, you'd have to switch to a different encoding (latin2). However in XML you can , whatever encoding the file is in, always refer to any of the characters in unicode (ie numbers up to hex 10FFFF) using the &# notation, however this notation always uses the same unicode numbers and so is independent of the encoding used (utf8, latin1, etc) except of course it depends on the encoding used for the symbols &;#x0-9a-fA-F which are actually used in the syntax. So if you want to force your processor to use &# syntax\ as much as possible you need to specify an encododing that includes as few characters as possible. The default utf8 encoding includes all of unicode, some processors let you use iso-88591 on output in which case anything above xFF will have to be output using &# notation. Some let you use us-ascii in which case everything above 127 will do that. Note however if your XML file uses any of these characters in element names such encodings can not be used, you can not use <é> as an element, so the text encoding used must include all characters used in element and attribute names. David _____________________________________________________________________ This message has been checked for all known viruses by Star Internet delivered through the MessageLabs Virus Scanning Service. For further information visit http://www.star.net.uk/stats.asp or alternatively call Star Internet for details on the Virus Scanning Service. XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Output: XML to XML scramb, Mike Ferrando | Thread | Re: [xsl] Output: XML to XML scramb, Wendell Piez |
[xsl] Program for XML Europe 2002 a, Mulberry Technologie | Date | [xsl] Building a 2 Column HTML Tabl, Champion, Ritchie |
Month |