RE: [xsl] MSXML - Processing non standard characters

Subject: RE: [xsl] MSXML - Processing non standard characters
From: "Andrew Kimball" <akimball@xxxxxxxxxxxxx>
Date: Wed, 1 Aug 2001 16:13:12 -0700
Warren,

You wrote:
>I am trying to transform an HTTP XML document which contains special
>characters using MSXML. I receive the following error when the
>transformation occurs:
>
>XML Error loading ''
>An invalid character was found in text content.
>
>I have no control over the format of the XML document. The XML document
has
><?xml version="1.0"?>in the first line. Microsoft's site says:
Re-encode the
>XML data as proper UTF-8.
>
>I added the following to my XSL file but it still doesn't work: <?xml
>version="1.0" encoding="UTF-8" ?>
>
>Since I can't change the original XML file, how can I resolve this
problem.
>

I suspect what is happening is that the data includes invalid XML
characters, like 0x1 or 0x2.  You can check this out by looking at a
binary representation of your file.  The XML spec allows these
characters:

[2]    Char    ::=    #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD]
| [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate
blocks, FFFE, and FFFF. */ 

Notice that many control characters are excluded (including x27 ESC).  I
assume that the reason for this is that XML is a text-based format.  In
practice, however, there is a need to allow these characters to be
represented.  I don't know why the XML WG didn't allow these special
characters to be entitized (e.g. &#x01; ).  Anybody know?  Maybe they
will fix this hole in a future version of the spec.

Until then, MSXML is correctly rejecting these illegal XML characters
(all conformant parsers must).  I'd say you should talk to your XML
supplier and point out they're sending you invalid XML data.  If this is
not possible, then you might have to preprocess the data and remove
invalid characters before sending the data to the parser.

~Andy Kimball
MSXSL Dev

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread