Re: [xsl] ANSI encoding

Subject: Re: [xsl] ANSI encoding
From: "Christopher R. Maden" <crism@xxxxxxxxx>
Date: Thu, 23 May 2002 02:32:21 -0700
Hash: SHA1

At 14:41 22/5/02, Joel Konkle-Parker wrote:
>What's the <?xml version="1.0" encoding=""?> encoding="" string for

You've already had a few answers addressing various facets of this; I hope 
this will also be useful.

ANSI, as Mike Kay pointed out, is a standards body.  Their best-known 
encoding is ASCII, whose identifier is "US-ASCII".  (The canonical charset 
name is "ANSI_X3.4-1968"; aliases include "ASCII" and "US-ASCII", which is 
preferred for MIME usage.)  ASCII is a 7-bit encoding, covering values from 
0 to 127; if you have any accented characters or other "weird" letters, you 
are not using ASCII.  Since ASCII is identical with UTF-8 for characters 
127 and below, and doesn't cover any other characters, you might as well 
leave the identifier out since UTF-8 is the default.

As others have mentioned, Windows sometimes calls its encoding 
"ANSI".  This is nonsensical, yet true.  If you are using a US or western 
European system, you are using Windows codepage 1252.  This is identical 
with the ISO western European encoding, ISO 8859-1, except for characters 
128-159 (which are control codes in ISO 8859-1 and are punctuation like the 
euro, ellipses, dagger, em dash, curly quotes in Windows CP 1252).  If you 
aren't using that middle range, use the label "ISO-8859-1"; if you are 
using that range, use the "windows-1252" label.  That's all if you're sure 
that you actually have an 8-bit encoding, and that the information hasn't 
been stored in UTF-8.  The easiest way to determine this is to open the 
document in a very stupid editor, or using "type" at the DOS prompt.  If 
your fancy schmancy euro-characters show up as single characters, it's an 
8-bit encoding; if they show up as sequences of multiple characters, 
usually starting with an accented A of some sort, then you're in UTF-8 and 
don't need a label.  If they show up as always two characters, the first of 
which is null, then it's UTF-16 and you still shouldn't need a label.

A complete list of IANA-registered identifiers can be found at <URL: >.

[This is what happens when charset nerds drink too much espresso.]



P.S. Signing your message doesn't help when your public key isn't available 
from any of the usual places.
- -- 
Christopher R. Maden, Principal Consultant, crism consulting
DTDs/schemas - conversion - ebooks - publishing - Web - B2B - training
<URL: >
PGP Fingerprint: BBA6 4085 DED0 E176 D6D4  5DFC AC52 F825 AFEC 58DA
Version: PGP Personal Privacy 6.5.8


 XSL-List info and archive:

Current Thread