Re: [xsl] How to read the encoding of an XML document

Subject: Re: [xsl] How to read the encoding of an XML document
From: James Garriss <jpgarriss@xxxxxxxx>
Date: Thu, 25 Oct 2001 17:18:09 -0400
At 01:33 PM 10/25/2001 -0700, Christopher R. Maden wrote:
At 12:59 25-10-2001, James Garriss wrote:
Ok. If you recall, I started this discussion by mentioning that I am receiving XML documents from several European countries. So the pertinent question for me is "if UTF-8 and/or UTF-16 will be the output encoding set I must use, will they handle charcters from the languages I care about?"

So it seems to me that I should be safe outputing my data to UTF-16. That make sense?

Yes. UTF-8 and UTF-16 both cover the entire Unicode repertoire. The difference is that that UTF-8 uses a different number of bytes for different characters, while UTF-16 uses 2 bytes for most characters. For European content, UTF-8 is usually a win; for Asian content, UTF-16 is generally better. But either can represent the entire Unicode repertoire.

I've been looking at a lot of European web pages, viewing source to see what charset they define in the HTML META tag. The majority use iso-8859-1, but a few don't. Most notably Turkey and Greece have character sets that are quite different. How do I determine if UTF-16 (or UTF-8) will work for those languages?


--James


XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list



Current Thread