Re: translating between character sets

Subject: Re: translating between character sets
From: Mike Brown <mike@xxxxxxxx>
Date: Tue, 17 Oct 2000 14:00:46 -0600 (MDT)
Matthias O. Will wrote:
> My input DTD has encoding UTF-8, and my output DTD encodes according to
> ISO-8859-1. So, in the input, I use entities for umlauts, which I want
> to be umlauts in the output. Example:
> 
> input	output
> --------------
> &auml;	ä
> &ouml;	ö
> &uuml;	ü
> &szlig;	ß
> 
> How would I achieve this mapping?

Why don't you use ISO 10646-1:1993 (~ Unicode) character references?
i.e. you input column should be
 &#228;
 &#246;
 &#252;
 &#223;

But if you must use entity references, declare the entities in the DTD for
the XML document that uses them, using these declarations:
  http://www.oasis-open.org/cover/xml-ISOents.txt

Example:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE myData SYSTEM "http://www.oasis-open.org/cover/xml-ISOents.txt";>
<myData>
   ...somewhere in here will be &auml; &ouml; etc. ...
</myData>

Or (better this way; just declare what you need, and don't fetch over a
network):

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE myData [
  <!ENTITY auml    "&#228;" >
  <!ENTITY ouml    "&#246;" >
  <!ENTITY uuml    "&#252;" >
  <!ENTITY szlig   "&#223;" >
]>
<myData>
   ...somewhere in here will be &auml; &ouml; etc. ...
</myData>

The XML parser will replace the entities with the characters you want. The
*output* you get from an XSLT processor acting on these characters in an
XML document depends on the processor, but if you put

 <xsl:output method="xml" version="1.0" encoding="iso-8859-1"/>

in the stylesheet, you should get the literal iso-8859-1 bytes for the
characters, assuming you've copied them to the result tree. If the output
method must be "html" then you will probably get entity references in the
output.

   - Mike
____________________________________________________________________
Mike J. Brown, software engineer at         My XML/XSL resources:
webb.net in Denver, Colorado, USA           http://www.skew.org/xml/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread