RE: [xsl] encoding of text files

Subject: RE: [xsl] encoding of text files
From: "Yates, Danny (ANTS)" <danny.yates@xxxxxxxxxx>
Date: Thu, 14 Nov 2002 10:48:41 -0000
Hi Joerg,

If you are outputting UTF-8 then your a-umlaut will be written as
a two-byte sequence. If your output is serialised XML or HTML then
this is fine, as there are headers which can declare that the
content is UTF-8 encoded. If, however, you are writing a plain text
file (as you say you are), there is no way for the process which
reads it in to determine whether it is UTF-8, ASCII, iso-8859-1 or
whatever.

The first string you give would appear to indicate that there are,
as expected, two bytes in the output stream where you expect your
a-umlaut character to appear, and the program you are using to
view this file doesn't understand this.

When you ask XSLT to output using iso-8859-1, it know that in this
encoding there is a single byte representation of a-umlaut, and it
uses this and it is correctly intpretted by your viewing program.

So, if you must write out UTF-8 (and it's quite possible that you
may be able to survive with iso-8859-1 if you're just using a few
simple accented characters, such as French and German), then you
need to tell your viewing program that the byte stream you are
feeding it is a UTF-8 encoded character stream.

Regards,

Dan.

-- 
Danny Yates
Technical Architect
Abbey National Treasury Services
E-mail: Danny.Yates@xxxxxxxxxx
Phone: +44 20 7756 5012
Fax: +44 20 7612 4342


-----Original Message-----
From: Joerg Heinicke [mailto:joerg.heinicke@xxxxxx]
Sent: 14 November 2002 10:03
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: [xsl] encoding of text files


Hello,

I have a problem with generated java/text files and their encoding.

 From a autotest description a java file is generated. If i use default 
output encoding (UTF-8), the German umlauts in the output looks like this
one:
            "geändert"

If I use ISO-8859-1 it's correct:
            "geändert"

I use Netbeans, which knows in general UTF-8 (with XML), but I don't know whether it knows UTF-8 in text files. At least the output of the java file is also wrong.

It should be possible to have text files in UTF-8, shouldn't it?? What can then be the problem? How are text files marked as UTF-8?

With pure XML encoding seems simple, but what about text files. Can somebody

  enlighten me or point to some resources?

Joerg

-- 

System Development
VIRBUS AG
Fon  +49(0)341-979-7419
Fax  +49(0)341-979-7409
joerg.heinicke@xxxxxxxxx
www.virbus.de


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


***************************************************************************
This communication (including any attachments) contains confidential information.  If you are not the intended recipient and you have received this communication in error, you should destroy it without copying, disclosing or otherwise using its contents.  Please notify the sender immediately of the error.

Internet communications are not necessarily secure and may be intercepted or changed after they are sent.  Abbey National Treasury Services plc does not accept liability for any loss you may suffer as a result of interception or any liability for such changes.  If you wish to confirm the origin or content of this communication, please contact the sender by using an alternative means of communication.

This communication does not create or modify any contract and, unless otherwise stated, is not intended to be contractually binding.

Abbey National Treasury Services plc. Registered Office:  Abbey National House, 2 Triton Square, Regents Place, London NW1 3AN.  Registered in England under Company Registration Number: 2338548.  Regulated by the Financial Services Authority (FSA).
***************************************************************************


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread