RE: [xsl] data vs. xml

Subject: RE: [xsl] data vs. xml
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Fri, 4 Apr 2003 16:15:05 +0100
> I modified the xml generator to include <![CDATA[ ]]> 
> elements.  However,  I also found that some of my data 
> contains RTF characters (i.e. \x093, \x096, \xB0).  I believe 
> this is a result of a copy and paste from MSWord into the 
> database program, and it is not an easy thing to fix as the 
> database contains tens of thousands of entries.  I also 
> noticed that the XSLT processor (instant saxon) still had 
> difficulty accepting a <![CDATA[ ]]> node that contained one 
> of the above characters.  My understanding is that the data 
> found within the <![CDATA[ ]]> should be considered just 
> that: data.

A CDATA section must contain legitimate XML characters.

I suspect that your problem is that your XML source file has no encoding
declaration, so the encoding is defaulting to UTF-8, and an octet such
as xB0 is not a valid UTF-8 encoding of any XML character.

You should specify the actual encoding of the XML file in an XML
declaration at its start. Your encoding is probably cp1252. Then you
need to parse it using an XML parser that recognizes this encoding. XML
parsers are not required to support any encodings other than UTF-8 and
UTF-16.

Michael Kay
Software AG
home: Michael.H.Kay@xxxxxxxxxxxx
work: Michael.Kay@xxxxxxxxxxxxxx 


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread