Re: [xsl] Non-English character problem

Subject: Re: [xsl] Non-English character problem
From: Mike Brown <mike@xxxxxxxx>
Date: Tue, 9 Jan 2001 15:26:30 -0700 (MST)
Fu, Gwowen wrote:
> I have a node contains non English character É.
> <SegmentName>RUE DES ÉRABLES</SegmentName>

To represent the character officially known as LATIN CAPITAL LETTER E WITH
ACUTE ACCENT in an XML document your options are as follows:

1. character reference:  &#xC9; or &#201;

2. entity reference: &Eacute;
     ...but this requires adding <!ENTITY Eacute "&#xC9;"> to the DTD

3. literal bytes in the file, in some encoding:
     if encoding="utf-8"       ...bytes are 0xC3 0x89
     if encoding="iso-8859-1"  ...bytes are 0xC9

You apparently have literal bytes in your file. What is the encoding? You
must declare the actual encoding in the prolog, or else the wrong one
(utf-8) is likely to be assumed.

If you produced the É with a text editor on Windows, it probably wrote the
file using the operating system's native encoding, windows-1252 (or
another windows-125x, depending on regional variations of Windows), unless
it gave you an option to save in some other format. In this encoding, the
character in question is represented by the single byte 0xC9, just like in

XML parsers are only required to support utf-8 and utf-16, so using any
other encoding is risky. Most parsers do support iso-8859-1, but many do
not support windows-1252. Since windows-1252 and iso-8859-1 are very
similar, you can often misdeclare the encoding to be iso-8859-1. This is
not safe, though, obviously, since the encodings are not identical.

Personally, I would recommend only using option 1, above.

> I tried putting "xml:lang='fr-CA' in the xml file and run xt with parameter
> language=fr-CA.

Language identifiers are just part of the data in XML. They do not affect
how the document is parsed.

   - Mike
Mike J. Brown, software engineer at            My XML/XSL resources: in Denver, Colorado, USA    

 XSL-List info and archive:

Current Thread