Subject: Re: [xsl] How to read the encoding of an XML document From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx> Date: Thu, 25 Oct 2001 14:14:43 -0400 |
Cheers, Wendell
> When you say Unicode, does that equate to UTF-8, UTF-16, UTF-32 or > something else? No unicode is essentially an abstract collection of characters, numbered 1 to x10FFFF (most of which slots are empty). an XML notation of ō refers to that abstract character number 333.
However to store unicode strings in files (and other places) you need some encoding that maps bytes in the file to these chracters. UTF-x are some of those encodings (all UTF encodings have the property that they can encode the whole unicode range) other encodings such as ascii or latin-1 are similar, but can't encode the whole range of characters.
> Or does the answer depend upon the XML parser you are > using, which in my case is MSXML3.0?
No. Internally the parser obviously has to use some encoding to store things (often this is utf-16, and it is in the case of msxml) in some programming api's you need to know this as you het handed the string, but in XSLT you never need to know what happens internally. Your XSLT stylesheet is an XML document so it goes through the same process.
Character data in the stylesheet is mapped to abstract unicode characters (using the encoding specified in the stylesheet) and the same happens for the source document. It is these abstract characters that are compared. So by then you don't need to know (and can't find out) what encoding the original files contained.
So your source might be in latin-2 and your stylesheet might be in latin-1 but by the time they have both been parsed everything is in abstract unicode characters and it is these that are compared in any XSLT query. (In fact MSXML3 uses utf16 but this is an internal detail that has no affect on the stylesheet)
David
====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] How to read the encoding , David Carlisle | Thread | RE: [xsl] ampersand output, Eric Vitiello |
RE: [xsl] How to read the encoding , James Garriss | Date | Re: [xsl] How to read the encoding , Thomas B. Passin |
Month |