Re: [xsl] Original encoding lost after transformation

Subject: Re: [xsl] Original encoding lost after transformation
From: David Carlisle <davidc@xxxxxxxxx>
Date: Fri, 18 Feb 2005 11:30:58 GMT
The encoding of the source file is interpreted by the XML parser which
just reports unicode character positions in an encoding neutral way so
xslt doesn't have any record of the original encoding.

You can ask that it outputs in latin 1 by going
<xsl:output encoding="iso-8859-1"/>

Note however that every XML application has to accept utf8 but support
for iso-8859-1 is an optional feature so (in theory at least) your
resulting document will be less portable.

You say you got:

  Some transformations (depending on the source xml) now fail on the
  second pass as the first pass returns data with the wrong UTF-8 declaration
  above. The error message is:
  
  java.io.UTFDataFormatException: Invalid byte 2 of 2-byte UTF-8 sequence

It's most unlikely that XSLT writes out a file with a bad encoding
declaration so this is probably an indication of another error
somewhere in your processing chain.

Note that if you are doing two transformations back to back from java
(or any other similar API)  you almost certainly should not be encoding
and decoding the intermediate stage at all. Most XSLT processors will
allow you to pass the result tree of the first transform directly to the
second transform without having to serialise it in some encoding and
then re-parse it.

David


________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

Current Thread