Re: [xsl] How to read the encoding of an XML document

Subject: Re: [xsl] How to read the encoding of an XML document
From: David Carlisle <davidc@xxxxxxxxx>
Date: Thu, 25 Oct 2001 16:53:54 +0100
> When you say Unicode, does that equate to UTF-8, UTF-16, UTF-32 or 
> something else?  
No unicode is essentially an abstract collection of characters, numbered
1 to x10FFFF (most of which slots are empty). an XML notation of &#333;
refers to that abstract character number 333.

However to store unicode strings in files (and other places) you need
some encoding that maps bytes in the file to these chracters. UTF-x are
some of those encodings (all UTF encodings  have the property that they can
encode the whole unicode range) other encodings such as ascii or latin-1
are similar, but can't encode the whole range of characters.

> Or does the answer depend upon the XML parser you are 
> using, which in my case is MSXML3.0?

No. Internally the parser obviously has to use some encoding to store
things (often this is utf-16, and it is in the case of msxml) in some
programming api's you need to know this as you het handed the string,
but in XSLT you never need to know what happens internally.
Your XSLT stylesheet is an XML document so it goes through the same
process.

Character data in the stylesheet is mapped to abstract unicode
characters (using the encoding specified in the stylesheet)
and the same happens for the source document. It is these abstract
characters that are compared. So by then you don't need to know (and
can't find out) what encoding the original files contained.

So your source might be in latin-2 and your stylesheet might be in
latin-1 but by the time they have both been parsed everything is in
abstract unicode characters and it is these that are compared
in any XSLT query. (In fact MSXML3 uses utf16 but this is an internal
detail that has no affect on the stylesheet)

David

_____________________________________________________________________
This message has been checked for all known viruses by Star Internet
delivered through the MessageLabs Virus Scanning Service. For further
information visit http://www.star.net.uk/stats.asp or alternatively call
Star Internet for details on the Virus Scanning Service.

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread