Re: [xsl] Unicode usage

Subject: Re: [xsl] Unicode usage
From: "Jonathan Perret" <jonathan@xxxxxxxxxxxx>
Date: Fri, 25 Jan 2002 18:02:04 +0100
> I also loaded each result into Notepad on Win95.  Notepad displayed the
iso
> file correctly, but not the utf-8 result (it showed that "A" character
with
> a little circle above it), ahead of the trademark symbol.  This is what I
> was suggesting would happen. BTW, Notepad on the Win2000 computer did
> display both results correctly.

I don't see what this proves that wasn't already obvious. Notepad on
Windows 95 supports only one encoding, which matches the installed
code page - that encoding is generally windows-1252 (what windows
calls 'ANSI' or even 'ASCII' -yuk!- sometimes) on an occidental version.
Feeding it utf-8 text, regardless of the actual codepoints used, is akin
to opening a Word document with it : though some text might appear
readable, the general result is garbage.

On Windows 2000, notepad has been upgraded to know about UTF-8,
so again it's no surprise that it can display the text correctly, given
that the file probably starts with a BOM mark, that signals it
as being utf-8 encoded. Note that without the BOM, Windows 2000
notepad would probably have 'failed' the same way as its Win95
cousin, since it would have assumed an ANSI-encoded file.

> Summarizing, what you will see displayed for high-order characters can
> depend on the encoding, OS,  and the viewing program.  On older versions
of
> Windows, at least, non-browsers are likely to display the wrong thing.

The fact is that what you will see is completely predictible (give or take
the
odd bug). If the viewing program is not told in what encoding the
text is, it will assume an encoding that will quite frequently be wrong.

In the notepad example, the OS itself has nothing to do with the issue :
notepad/Win95 and notepad/Win2000 are two very different programs.
If you were to take the win95 notepad binary and run it under Win2000,
it'd behave exactly the same as under win95. Why not try this ?

> In fact, even on my Win2000 machine, using XML Cooktop to run and display
> the transformation gave an incorrect display (and it uses the IE activeX
> control to display the results!), so you can't be sure even on Win2000
that
> high order characters will display the intended way, depending on the app.

If XML Cooktop (which I've never used) has the same bug as XML Spy,
then it has trouble with MSXML's transformNode method, which always
transforms to UTF-16 regardless of the <xsl:output> element. That would
cause what you've been seeing.

Cheers,
--Jonathan



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread