RE: [xsl] How to read the encoding of an XML document

Subject: RE: [xsl] How to read the encoding of an XML document
From: James Garriss <jpgarriss@xxxxxxxx>
Date: Thu, 25 Oct 2001 13:50:04 -0400
I asked:

> When you say Unicode, does that equate to UTF-8, UTF-16, UTF-32 or
> something else?  Or does the answer depend upon the XML
> parser you are
> using, which in my case is MSXML3.0?

Michael Kay wrote:


When the XML is in a file on disk, each Unicode character is represented by
one or more bytes, so it's reasonable to talk about encoding. When the XML
has been parsed and is passed to your application via an API, the characters
are typically variables of some data type depending on your programming
language, so their binary representation is no longer of any concern.

David Carlisle wrote:

So your source might be in latin-2 and your stylesheet might be in
latin-1 but by the time they have both been parsed everything is in
abstract unicode characters and it is these that are compared
in any XSLT query. (In fact MSXML3 uses utf16 but this is an internal
detail that has no affect on the stylesheet)

Ok, I think you two are saying the same thing. As long as my XML and my XSL are DOMDocuments, encoding is not relevant.


In my case, they aren't going to stay DOMDocuments for long. I'm going to transformNodeToObject and save the results to a file, either as XML or HTML, depending upon the xsl:output method.

If I no longer know what my original XML document was encoded as, how do I know the appropriate encoding set to specify for the output?

In XML I was going to do xsl:output encoding="whatever the input xml was"

In HTML I was going to do META content="text/xml; charset=whatever the input XML was"

Very much appreciating the expert responses,

--James Garriss


XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list



Current Thread