Subject: Re: [xsl] ANSI encoding From: "Christopher R. Maden" <crism@xxxxxxxxx> Date: Thu, 23 May 2002 02:32:21 -0700 |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 At 14:41 22/5/02, Joel Konkle-Parker wrote: >What's the <?xml version="1.0" encoding=""?> encoding="" string for >ANSI? You've already had a few answers addressing various facets of this; I hope this will also be useful. ANSI, as Mike Kay pointed out, is a standards body. Their best-known encoding is ASCII, whose identifier is "US-ASCII". (The canonical charset name is "ANSI_X3.4-1968"; aliases include "ASCII" and "US-ASCII", which is preferred for MIME usage.) ASCII is a 7-bit encoding, covering values from 0 to 127; if you have any accented characters or other "weird" letters, you are not using ASCII. Since ASCII is identical with UTF-8 for characters 127 and below, and doesn't cover any other characters, you might as well leave the identifier out since UTF-8 is the default. As others have mentioned, Windows sometimes calls its encoding "ANSI". This is nonsensical, yet true. If you are using a US or western European system, you are using Windows codepage 1252. This is identical with the ISO western European encoding, ISO 8859-1, except for characters 128-159 (which are control codes in ISO 8859-1 and are punctuation like the euro, ellipses, dagger, em dash, curly quotes in Windows CP 1252). If you aren't using that middle range, use the label "ISO-8859-1"; if you are using that range, use the "windows-1252" label. That's all if you're sure that you actually have an 8-bit encoding, and that the information hasn't been stored in UTF-8. The easiest way to determine this is to open the document in a very stupid editor, or using "type" at the DOS prompt. If your fancy schmancy euro-characters show up as single characters, it's an 8-bit encoding; if they show up as sequences of multiple characters, usually starting with an accented A of some sort, then you're in UTF-8 and don't need a label. If they show up as always two characters, the first of which is null, then it's UTF-16 and you still shouldn't need a label. A complete list of IANA-registered identifiers can be found at <URL: http://www.iana.org/assignments/character-sets >. [This is what happens when charset nerds drink too much espresso.] ~Chris >-----BEGIN PGP SIGNATURE----- P.S. Signing your message doesn't help when your public key isn't available from any of the usual places. - -- Christopher R. Maden, Principal Consultant, crism consulting DTDs/schemas - conversion - ebooks - publishing - Web - B2B - training <URL: http://crism.maden.org/consulting/ > PGP Fingerprint: BBA6 4085 DED0 E176 D6D4 5DFC AC52 F825 AFEC 58DA -----BEGIN PGP SIGNATURE----- Version: PGP Personal Privacy 6.5.8 iQA/AwUBPOy3JaxS+CWv7FjaEQJy1QCbB1RoZtUWzQXVwDqBkopJ5jycg8YAmwdH 1NgVgikf5WevBGwg5AQmbnZn =/+JM -----END PGP SIGNATURE----- XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] ANSI encoding, Michael Kay | Thread | [xsl] Extreme Markup 2002 - Call fo, B. Tommie Usdin |
[xsl] Supplying input file name as , Mukul . Mudgal | Date | [xsl] xlink implementation - html l, ChivaBaba |
Month |