Subject: Re: [xsl] Namespace Identifiers - URI, URN, URL? From: "Michael Beddow" <mbnospam@xxxxxxxxxxx> Date: Wed, 29 Aug 2001 17:25:00 +0100 |
Sorry about the previous empty posting! > No. Just be sure your document uses only UTF-8 characters if you > don't put it, because that's the default character set defined by the > XML Recommendation. Any non-UTF-8 character sequences in your XML > document (such as extended ASCII/ISO-8859-1 characters) will cause > your XML document to become invalid, and hence unparseable by any > conformant XML parser. It's better to put the XML declaration in and > explicitly state the character set you use, e.g.: David has already commented on this, but to draw out the misleading things a bit more explicitly: It's probably best to avoid a phrase like "ISO-8859-1 characters" altogether, because it's dangerously ambiguous. It can mean (1) "abstract characters to which ISO-8859-1 assigns code points" OR (2) "character data which has been encoded using the code-points assigned to abstract characters by ISO-8859-1 and where those code points are represented as single 8-bit numbers" Sense (1) simply means that such abstract characters are present, but makes no statement about how they are encoded. So in sense (1) you can have as many "ISO-8859-1" characters as you like in an XML document declared explicitly or by default to be utf-8 encoded, provided they are indeed utf-8 encoded. But if you try to use "ISO-8859-1 characters" in sense (2) in a supposedly utf-8 encoded document, the parser will throw a fatal error if your characters include any outside the ascii subset, because many of those 8-bit values are illegal in utf-8 except in certain positions in a multi-byte sequence. The trouble is its very easy to get sense (2) characters if you have to process data handed to you by people who neither know nor care about encoding issues. So the original posting ought to have read: "Just be sure your document uses UTF-8 encoding if you don't put it, because that's the default encoding..." then continued: "Any bytes in your XML document that are not part of a valid utf-8 encoding sequence will cause your XML document to become invalid" and concluded with: "It's better to put the XML declaration in and explicitly state the encoding used" Apologies again to those who know all this, but until the enigma of how to get it into the FAQ in a generally understandable way is solved the struggle has to continue.... Michael --------------------------------------------------------- Michael Beddow http://www.mbeddow.net/ XML and the Humanities page: http://xml.lexilog.org.uk/ --------------------------------------------------------- XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Namespace Identifiers - U, Michael Beddow | Thread | Re: [xsl] Namespace Identifiers - U, Thomas B. Passin |
Re: [xsl] Mailto problem, Michael Beddow | Date | RE: [xsl] options select, Chris Bayes |
Month |