RE: [xsl] encoding of text files

Subject: RE: [xsl] encoding of text files
From: "Julian Reschke" <julian.reschke@xxxxxx>
Date: Thu, 14 Nov 2002 12:06:55 +0100
Just a throught: it may make sense to prefix the text file with a UTF-8 BOM
(as far as I remember, at least Notepad on Windows honors this).

--
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760

> -----Original Message-----
> From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> [mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx]On Behalf Of Yates, Danny
> (ANTS)
> Sent: Thursday, November 14, 2002 11:49 AM
> To: 'xsl-list@xxxxxxxxxxxxxxxxxxxxxx'
> Subject: RE: [xsl] encoding of text files
>
>
> Hi Joerg,
>
> If you are outputting UTF-8 then your a-umlaut will be written as
> a two-byte sequence. If your output is serialised XML or HTML then
> this is fine, as there are headers which can declare that the
> content is UTF-8 encoded. If, however, you are writing a plain text
> file (as you say you are), there is no way for the process which
> reads it in to determine whether it is UTF-8, ASCII, iso-8859-1 or
> whatever.
>
> The first string you give would appear to indicate that there are,
> as expected, two bytes in the output stream where you expect your
> a-umlaut character to appear, and the program you are using to
> view this file doesn't understand this.
>
> When you ask XSLT to output using iso-8859-1, it know that in this
> encoding there is a single byte representation of a-umlaut, and it
> uses this and it is correctly intpretted by your viewing program.
>
> So, if you must write out UTF-8 (and it's quite possible that you
> may be able to survive with iso-8859-1 if you're just using a few
> simple accented characters, such as French and German), then you
> need to tell your viewing program that the byte stream you are
> feeding it is a UTF-8 encoded character stream.
>
> Regards,
>
> Dan.


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread