Subject: Re: [xsl] xml invalid characters From: Mike Brown <mike@xxxxxxxx> Date: Fri, 22 Mar 2002 16:08:11 -0700 (MST) |
stevenson wrote: > How can I avoid these problem. The data is from the database, and the > character crashing it is £ You probably have an encoding problem. I assume that you're having trouble with the British currency symbol for a Pound? At least, that's what it looks like on my screen. Quick lesson: The POUND SIGN is character number A3 (hex) in Unicode. "U+00A3" is how you can write it unambiguously in prose. Encoding provides a way of representing that A3 as bytes. iso-8859-1: A3 utf-8: C2 A3 utf-16: 00 A3 (little endian) A3 00 (big endian) utf-8 and utf-16 can represent any Unicode character, but other encodings are more limited, usually only representing 256 characters max. If a character cannot be represented in a particular encoding, you write it as a sequence of characters that can be represented in any encoding (spaces added for clarity): & # x A 3 ; or & # 1 6 3 ; For example, us-ascii does not have POUND SIGN (this may be the source of your problem; it's hard to say, without knowing all the stages of processing of your data, and the role Cold Fusion plays in it). So you'd have to use this escaped format. & # x A 3 ; us-ascii: 26 23 78 41 33 3B And this escaped format (a "character reference") also works just as well in other encodings: iso-8859-1: 26 23 78 41 33 3B utf-8: 26 23 78 41 33 3B utf-16: 00 26 00 23 00 78 00 41 00 33 00 3B (little endian) utf-16: 26 00 23 00 78 00 41 00 33 00 3B 00 (big endian) Now check your XML document. When you look at the document in a text editor, it might say <?xml version="1.0" encoding="utf-8"?> ^^^^^^^^^^^^^^^^ This encoding declaration is an assertion made by the document as to how its bytes map to Unicode characters. It is just a hint for the XML parser to use when reading the document; it is not secret code that causes anything about the document's *actual* encoding to change. If this declaration is missing, UTF-8 or UTF-16 are assumed (UTF-8 unless the document begins with bytes FF FE or FE FF). It is your responsibility to ensure that the encoding declaration is an accurate reflection of the document's *actual* encoding. As you can guess, this is where most people run into problems. They are passing "text" around in their software without paying attention to whether & how it has been encoded. So, in order to diagnose encoding related problems, you must trace the processes that your data passes through, and determine how it is encoded/decoded at each step. Also, you didn't say what your problem has to do with XSLT. This is the xsl-list. If you have general xml processing questions, ask them on xml-dev. If you're using XSLT, then you usually only need to be concerned about - the source and stylesheet XML documents must have accurate encoding declarations - the output encoding, as controlled by <xsl:output encoding="..."/> should be what you wanted (there is a FAQ regarding invoking MSXML from scripts, where the output becomes UTF-16, depending on how you capture it) Good luck. - Mike ____________________________________________________________________________ mike j. brown | xml/xslt: http://skew.org/xml/ denver/boulder, colorado, usa | resume: http://skew.org/~mike/resume/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] xml invalid characters, stevenson | Thread | [xsl] regarding variables and axis, william locksman |
Re: [xsl] invalid xml characters, Thomas B. Passin | Date | RE: [xsl] invalid xml characters, Joshua Allen |
Month |