Re: [xsl] XSLT encoding problem

Subject: Re: [xsl] XSLT encoding problem
From: Mike Brown <mike@xxxxxxxx>
Date: Tue, 8 Jul 2003 14:20:28 -0600 (MDT)
Venkat Gyambavantha wrote:
> I have an xml with UTF-8 encoding. I want to just change the encoding to ISO
> Latin 1 using XSLT.

Regardless of the encoding of the source XML, this is the stylesheet
you would use:

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
  <xsl:output method="xml" indent="no" encoding="iso-8859-1"/>
  <xsl:template match="/">
    <xsl:copy-of select="/">
  </xsl:template>
</xsl:stylesheet>

>  I see the following error
> 
> "An invalid XML character (unicode 0xfc) "

The XML parser is complaining before the XSLT processor even comes into
the picture. This is a cryptic error message, though, because
Unicode character number (hexadecimal) FC (Latin small letter u with umlaut)
*is* a legal character in XML. It can't appear just anywhere, though.

Regardless, I suspect that you were trying to change the XML's actual
encoding by manually editing the encoding declaration in the XML. The
encoding declaration is just a hint to the XML parser to tell it how
the document's bytes are supposed to be mapped to Unicode characters.
The encoding that was actually used to produce the bytes of the document
is the only one you are allowed to put in the declaration.

That is, if your document contains the single byte FC to represent
Latin small letter u with umlaut, then you must declare the encoding
as iso-8859-1. If it uses the two bytes C3 BC to represent Latin small
letter u with umlaut, then you must declare the encoding as utf-8.

If you accurately declare the encoding, the XML document can be parsed 
and its important bits fed to the XSLT processor, which can build the
source tree from that information. The stylesheet above can be used to
duplicate the source tree and serialize the result as iso-8859-1 encoded
text in XML syntax. You will lose unimportant lexical details that 
a parser is designed to weed through, such as entity and character
references, CDATA sections, and what kind of quotes you had originally
put around attribute values, but the content will be the same.

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread