Subject: Re: [xsl] How to read the encoding of an XML document From: Mike Brown <mike@xxxxxxxx> Date: Fri, 26 Oct 2001 02:43:06 -0600 (MDT) |
> I've been looking at a lot of European web pages, viewing source to see > what charset they define in the HTML META tag. The majority use > iso-8859-1, but a few don't. Most notably Turkey and Greece have character > sets that are quite different. How do I determine if UTF-16 (or UTF-8) > will work for those languages? Since UTF-8 and UTF-16 encode every abstract character in the Unicode repertoire, either will work for any language, as the script (writing system) used by any one language will require only a very small subset of that massive repertoire. However, using UTF-16 to encode an HTML document is problematic for other reasons, not the least of which is that it is impossible for an HTML document to use a META tag to identify itself as UTF-16 encoded, when everything leading up to and including the META has to be US-ASCII. And then there's that UTF-16 can seriously confuse Netscape. UTF-8 is your safest bet. One thing you should be aware of is that there are so many HTML documents out there that improperly declare their encoding, that the web browsers allow the user to override the stated or implied encoding and decode the document's bytes according to whatever charset the user chooses. You might find that even if you output in a particular encoding and take pains to properly declare it, you will still have users who see the wrong characters because they've got their browsers configured to always assume it's something else. In fact, I think a lot of browsers ship that way; you have to turn *on* autodetection... And then there's the whole issue of fonts, which are user-overridable, come in different versions with different character coverage, and ultimately may not even have the right glyphs for the characters in the document, assuming you've gotten past all the other gotchas. Good luck. - Mike ____________________________________________________________________________ mike j. brown, fourthought.com | xml/xslt: http://skew.org/xml/ denver/boulder, colorado, usa | personal: http://hyperreal.org/~mike/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] How to read the encoding , David Carlisle | Thread | [xsl] adding an attribute, Carmelo Montanez |
Re: [xsl] Can't pass parameters acr, Joerg Pietschmann | Date | Re: [xsl] escaping from CDATA, Mike Brown |
Month |