Subject: Re: [xsl] Non-English character problem From: Mike Brown <mike@xxxxxxxx> Date: Tue, 9 Jan 2001 15:26:30 -0700 (MST) |
Fu, Gwowen wrote: > I have a node contains non English character É. > <SegmentName>RUE DES ÉRABLES</SegmentName> To represent the character officially known as LATIN CAPITAL LETTER E WITH ACUTE ACCENT in an XML document your options are as follows: 1. character reference: É or É 2. entity reference: É ...but this requires adding <!ENTITY Eacute "É"> to the DTD 3. literal bytes in the file, in some encoding: if encoding="utf-8" ...bytes are 0xC3 0x89 if encoding="iso-8859-1" ...bytes are 0xC9 You apparently have literal bytes in your file. What is the encoding? You must declare the actual encoding in the prolog, or else the wrong one (utf-8) is likely to be assumed. If you produced the É with a text editor on Windows, it probably wrote the file using the operating system's native encoding, windows-1252 (or another windows-125x, depending on regional variations of Windows), unless it gave you an option to save in some other format. In this encoding, the character in question is represented by the single byte 0xC9, just like in iso-8859-1. XML parsers are only required to support utf-8 and utf-16, so using any other encoding is risky. Most parsers do support iso-8859-1, but many do not support windows-1252. Since windows-1252 and iso-8859-1 are very similar, you can often misdeclare the encoding to be iso-8859-1. This is not safe, though, obviously, since the encodings are not identical. Personally, I would recommend only using option 1, above. > I tried putting "xml:lang='fr-CA' in the xml file and run xt with parameter > language=fr-CA. Language identifiers are just part of the data in XML. They do not affect how the document is parsed. - Mike ____________________________________________________________________ Mike J. Brown, software engineer at My XML/XSL resources: webb.net in Denver, Colorado, USA http://skew.org/xml/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Non-English character pro, Fernando López Carba | Thread | Re: [xsl] Non-English character pro, Miloslav Nic |
RE: [xsl] Non-English character pro, Clapham, Paul | Date | [xsl] Avoiding the use of count whe, Anchal Jain |
Month |