Re: [xsl] Smart Quote Encoding

Subject: Re: [xsl] Smart Quote Encoding
From: "Deborah Pickett" <debbiep-list-xsl@xxxxxxxxxx>
Date: Thu, 13 Sep 2007 13:56:49 +1000 (EST)
On Thu, September 13, 2007 02:56, Roger L. Cauvin wrote:
> The XML is a log file that logs incoming text-only e-mail messages.  The
> messages sometimes contain special/nonstandard characters, such as smart
> quotes.  If I want to be able to log the verbatim messages yet still be
> able
> to apply XSLT, what is my best strategy?

The problem is (sort of) the file's encoding, but not in the usual
Latin-1/UTF-8-confusion way.

This error:

>   Error
>     org.xml.sax.SAXParseException: illegal XML character U+18: illegal XML
> character U+18

says that you have a character U+18 (i.e., ASCII CAN, decimal 24, Ctrl-X)
in your file.  That character isn't allowed in XML.  See:
http://www.w3.org/TR/REC-xml/#charsets

Whatever is generating the "XML" file is putting that character in,
erroneously.  You will have to either tell the generator to not do that,
or you will have to insert a pipeline stage that converts U+18 into some
other character so that the document is actually XML and can be parsed.

To add to the conformance woes of whatever is producing your input, U+18
is not a printable character in ISO 8859-1, nor are smart quotes part of
true ISO 8859-1 (they are in Windows-1252), so if it is producing the XML
declaration you quoted then it is doubly wrong.

Current Thread