Re: [xsl] How to convert XML doc from UTF-8 to ISO-8859-1 char encoding?

Subject: Re: [xsl] How to convert XML doc from UTF-8 to ISO-8859-1 char encoding?
From: "James A. Robinson" <jim.robinson@xxxxxxxxxxxx>
Date: Mon, 11 Jan 2010 06:51:29 -0800
> Assume I have a XML doc which is UTF-8 encoded.
> 
> Can I convert it somehow to ISO-8859-1 encoding?

Since this is an XSLT list, I'll mention the XSLT way to do it:

  Set xsl:output/@encoding to the encoding you want.  Your XSLT
  engine has to support the encoding, naturally.

  For example:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="2.0">
  <xsl:output encoding="ISO-8859-1" />
  <xsl:template match="node()">
    <xsl:copy>
      <xsl:copy-of select="@*" />
      <xsl:apply-templates select="node()" />
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

> And how to  encode it the opposite direction?

Assuming your XML Parser can consume ISO-8859-1, the above stylesheet
with the xsl:output/@encoding set to "UTF-8" would do that.
 
> Normally the encoding is defined in an attribute in the top most <xml> tag.

Careful, it's not a requirement (there are rules about what to do when it
is not declared).

> Is there a way to detect if this declaration is true and corresponds with the real encoding in the full XML doc?
> Or if it is faked/misplaced by mistake?

This is more of an XML Parser question, usually ought to happen if the
wrong encoding is declared *AND* the bytes used in the document can be
identified as being incorrect for the encoding, the parser will identify
that the document is invalid.

But the way encodings work, it is possible for a sequence of bytes to
be valid for a given encoding even if it's invalid according to the
authors intent.  In other words a character might show up incorrectly
when the glyph is rendered because the encoding is incorrectly stated,
but the XML Parser won't know it.


Jim

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
James A. Robinson                       jim.robinson@xxxxxxxxxxxx
Stanford University HighWire Press      http://highwire.stanford.edu/
+1 650 7237294 (Work)                   +1 650 7259335 (Fax)

Current Thread