Re: [xsl] How to read the encoding of an XML document

Subject: Re: [xsl] How to read the encoding of an XML document
From: Mike Brown <mike@xxxxxxxx>
Date: Fri, 26 Oct 2001 02:43:06 -0600 (MDT)
> I've been looking at a lot of European web pages, viewing source to see 
> what charset they define in the HTML META tag.  The majority use 
> iso-8859-1, but a few don't.  Most notably Turkey and Greece have character 
> sets that are quite different.  How do I determine if UTF-16 (or UTF-8) 
> will work for those languages?

Since UTF-8 and UTF-16 encode every abstract character in the Unicode
repertoire, either will work for any language, as the script (writing
system) used by any one language will require only a very small subset
of that massive repertoire.

However, using UTF-16 to encode an HTML document is problematic for other
reasons, not the least of which is that it is impossible for an HTML
document to use a META tag to identify itself as UTF-16 encoded, when
everything leading up to and including the META has to be US-ASCII. And 
then there's that UTF-16 can seriously confuse Netscape. UTF-8 is your 
safest bet.

One thing you should be aware of is that there are so many HTML documents
out there that improperly declare their encoding, that the web browsers
allow the user to override the stated or implied encoding and decode the
document's bytes according to whatever charset the user chooses. You might
find that even if you output in a particular encoding and take pains to
properly declare it, you will still have users who see the wrong
characters because they've got their browsers configured to always assume
it's something else. In fact, I think a lot of browsers ship that way; you
have to turn *on* autodetection...

And then there's the whole issue of fonts, which are user-overridable,
come in different versions with different character coverage, and
ultimately may not even have the right glyphs for the characters in the
document, assuming you've gotten past all the other gotchas. Good luck.

   - Mike
  mike j. brown,  |  xml/xslt:
  denver/boulder, colorado, usa   |  personal:

 XSL-List info and archive:

Current Thread