Re: [xsl] Smart Quote Encoding

Subject: Re: [xsl] Smart Quote Encoding
From: "Deborah Pickett" <debbiep-list-xsl@xxxxxxxxxx>
Date: Thu, 13 Sep 2007 13:56:49 +1000 (EST)
On Thu, September 13, 2007 02:56, Roger L. Cauvin wrote:
> The XML is a log file that logs incoming text-only e-mail messages.  The
> messages sometimes contain special/nonstandard characters, such as smart
> quotes.  If I want to be able to log the verbatim messages yet still be
> able
> to apply XSLT, what is my best strategy?

The problem is (sort of) the file's encoding, but not in the usual
Latin-1/UTF-8-confusion way.

This error:

>   Error
>     org.xml.sax.SAXParseException: illegal XML character U+18: illegal XML
> character U+18

says that you have a character U+18 (i.e., ASCII CAN, decimal 24, Ctrl-X)
in your file.  That character isn't allowed in XML.  See:

Whatever is generating the "XML" file is putting that character in,
erroneously.  You will have to either tell the generator to not do that,
or you will have to insert a pipeline stage that converts U+18 into some
other character so that the document is actually XML and can be parsed.

To add to the conformance woes of whatever is producing your input, U+18
is not a printable character in ISO 8859-1, nor are smart quotes part of
true ISO 8859-1 (they are in Windows-1252), so if it is producing the XML
declaration you quoted then it is doubly wrong.

