Re: [xsl] xalan encoding issues?

Subject: Re: [xsl] xalan encoding issues?
From: Joseph Kesselman <keshlam@xxxxxxxxxx>
Date: Fri, 4 Oct 2002 09:25:01 -0400
Just a reminder that Xalan has its own mailing lists too, which may be a 
better place to ask for questions specifically about this processor; 
you'll reach a higher concentration of users/experts/developers of that 
tool there. (Questions about XSL/XSLT in general are probably best kept 
here, of course.)

>1.Does Xalan allow all kinds of character encoding
>or just some specific encodings ?

On input, Xalan accepts whatever encodings the parser will accept. Xalan 
generally ships with Xerces or Crimson, so check their docs. Of course you 
can plug in other parsers by passing Xalan a SAXSource or (currently less 
efficient) DOMSource.

On output... Obviously, if you ask Xalan to produce SAX or DOM output, 
we'll use the encoding which is native to that API, generally UTF-16. If 
you ask for character-stream output, our serializer supports a reasonably 
wide range of encodings. Not "all" -- we're still looking for an 
open-source encoding support package which covers a wider range -- but we 
don't seem to be be getting a lot of complaints that a needed encoding 
isn't supported.

>2.If the encoding is not specified in the xml file,
>what does xalan assume as a default encoding?

On input: That's up to the parser again -- but the XML standard says that 
parsers should assume UTF8 or UTF16. 

On output: UTF-16 for SAX or DOM output, UTF-8 for character streams.

>3.If i do not specify the encoding in my xml file, and if the 
>xml has for  example arabic or polish characters,would it still 
>open in my IE?

That's an IE question, not a Xalan question.

>4. If the XML file is has an encoding iso-8859-1 and if the 
>xml is a valid xml, what is the effect of
>  1. Having an xml declaration with encoding in the xsl??
>  2. Having an xsl:output with encoding specified?

This too does not seem to be a Xalan question. 

If you specify an encoding in the XML declaration, that affects how the 
document is read -- so in this case, you're specifying what characters are 
acceptable in the stylesheet document and how they're translated. That 
occurs at the parser level. 

If you specify encoding in xsl:output, that's a request that the generated 
document be written out in a specific encoding. Any XSLT processor should 
attempt to comply with that request. The spec doesn't spell out what 
happens if it the encoding is not one of those which is supported.

Joe Kesselman  / IBM Research

 XSL-List info and archive:

Current Thread