Re: [xsl] How to read the encoding of an XML document

Subject: Re: [xsl] How to read the encoding of an XML document
From: "Thomas B. Passin" <tpassin@xxxxxxxxxxxx>
Date: Thu, 25 Oct 2001 14:46:26 -0400
[James Garriss]


> If I no longer know what my original XML document was encoded as, how do I
> know the appropriate encoding set to specify for the output?
>
>   In XML I was going to do xsl:output encoding="whatever the input xml
was"
>
>   In HTML I was going to do META content="text/xml; charset=whatever the
> input XML was"
>

As you can tell  by now, if you want to get the original encoding you have
to get it from the document separately from any xslt transformation.
Another point to think about is that sometimes an encoding declaration is
mistaken and the document actually contains illegal characters for the
claimed encoding.  An xml parser (or the xslt processor's front end) will
catch that.

Given that you have to do something outside of a stylesheet, here's the
simplest thing I can think of.  Create a set of identity transform
stylesheets (check the FAQs if you aren't sure how to do that), each one
with a different output encoding that you wish to support.  Capture the
encoding with a separate non-xslt program.  After you run the main
transformation,  run the result through the right identity transform,
selected based on the encoding you captured.

This might be slow on large files, though, since two files will have to be
parsed and transformed.

Another approach is to dynamically build the stylesheet before use, changing
only the encoding, again using non-xslt methods.  This would be very easy
and fast.  You would use the encoding you discovered to build the stylesheet
version you wanted.

Are you sure you don't just want to output utf-8 and be done with it?

Cheers,

Tom P


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread