Re: [xsl] Non-English character problem

Fu, Gwowen wrote:
> I have a node contains non English character É.
> <SegmentName>RUE DES ÉRABLES</SegmentName>

To represent the character officially known as LATIN CAPITAL LETTER E WITH
ACUTE ACCENT in an XML document your options are as follows:

1. character reference:  &#xC9; or &#201;

2. entity reference: &Eacute;
     ...but this requires adding <!ENTITY Eacute "&#xC9;"> to the DTD

3. literal bytes in the file, in some encoding:
     if encoding="utf-8"       ...bytes are 0xC3 0x89
     if encoding="iso-8859-1"  ...bytes are 0xC9

You apparently have literal bytes in your file. What is the encoding? You
must declare the actual encoding in the prolog, or else the wrong one
(utf-8) is likely to be assumed.

If you produced the É with a text editor on Windows, it probably wrote the
file using the operating system's native encoding, windows-1252 (or
another windows-125x, depending on regional variations of Windows), unless
it gave you an option to save in some other format. In this encoding, the
character in question is represented by the single byte 0xC9, just like in
iso-8859-1.

XML parsers are only required to support utf-8 and utf-16, so using any
other encoding is risky. Most parsers do support iso-8859-1, but many do
not support windows-1252. Since windows-1252 and iso-8859-1 are very
similar, you can often misdeclare the encoding to be iso-8859-1. This is
not safe, though, obviously, since the encodings are not identical.

Personally, I would recommend only using option 1, above.

> I tried putting "xml:lang='fr-CA' in the xml file and run xt with parameter
> language=fr-CA.

Language identifiers are just part of the data in XML. They do not affect
how the document is parsed.

   - Mike
____________________________________________________________________
Mike J. Brown, software engineer at            My XML/XSL resources: 
webb.net in Denver, Colorado, USA              http://skew.org/xml/

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Current Thread
[xsl] Non-English character problem Fu, Gwowen - Tue, 9 Jan 2001 14:56:08 -0600 Fernando López Carballeda - Tue, 9 Jan 2001 22:26:18 +0100 Mike Brown - Tue, 9 Jan 2001 15:26:30 -0700 (MST) <= Miloslav Nic - Wed, 10 Jan 2001 07:54:00 +0100 <Possible follow-ups> Fu, Gwowen - Tue, 9 Jan 2001 15:45:42 -0600 Clapham, Paul - Tue, 9 Jan 2001 14:24:16 -0800 Fu, Gwowen - Tue, 9 Jan 2001 17:13:26 -0600

Current Thread

[xsl] Non-English character problem
- Fu, Gwowen - Tue, 9 Jan 2001 14:56:08 -0600
  - Fernando López Carballeda - Tue, 9 Jan 2001 22:26:18 +0100
  - Mike Brown - Tue, 9 Jan 2001 15:26:30 -0700 (MST) <=
  - Miloslav Nic - Wed, 10 Jan 2001 07:54:00 +0100
  - <Possible follow-ups>
  - Fu, Gwowen - Tue, 9 Jan 2001 15:45:42 -0600
  - Clapham, Paul - Tue, 9 Jan 2001 14:24:16 -0800
  - Fu, Gwowen - Tue, 9 Jan 2001 17:13:26 -0600

<- Previous	Index	Next ->
RE: [xsl] Non-English character pro, Fernando López Carba	Thread	Re: [xsl] Non-English character pro, Miloslav Nic
RE: [xsl] Non-English character pro, Clapham, Paul	Date	[xsl] Avoiding the use of count whe, Anchal Jain
	Month

<-prev [Thread] next->	<-prev [Date] next->
Month Index \| List Home