Re: [xsl] International Characters in attributes

Subject: Re: [xsl] International Characters in attributes
From: "Michael Beddow" <mbnospam@xxxxxxxxxxx>
Date: Sun, 11 Feb 2001 18:02:48 -0000
On Saturday, February 10, 2001 1:55 PM
David Carlisle wrote:

(MB)>> Oh but it does, and how! That's a very WesternEuro-centric
> Or a standards conforming view, depending on how you view it.
> If utf-8 encoding
> is used then any XML system must support it, and most modern HTML
> systems will as well, won't they?

Sure, as long as you stick to the nice safe areas where utf-8 and
ASCII coincide. Even there, as Jo Bourne recently reported here, many
fairly recent builds of Netscape for the MAC go bananas even on ASCII
range characters if they're told the encoding is utf-8. And some
builds of Netscape for various Unices refuse to render html at all,
treating it as plain text, if you try to set utf-8 encoding via a http
header rather than a meta tag. But once you get into the areas of the
BMP where utf-8 starts producing the "transformations" that the "t"
stands for, with 3 or even 5-byte sequences, none of the browsers I've
looked at will behave 100% properly (and some XML parsers and XSLT
engines can hiccup as well).

That's partly why people still use encodings other than utf-8. And
once you do, the same numeric character references will mean different
things in different encodings, (there aren't named entities in html
for the 20,000+ Chinese characters) and so show differently in the
browser. Added to which, as Mike Brown explained, it isn't entirely
clear what method browsers should (let alone do) use to determine
encodings. And then there are the sysadmins in CJK regions who try to
ward off user problems by locking all the browsers on their site down
to a specific encoding (often shift-JIS in Japan), a trick also
sometimes perpetrated by ISP's when customising browsers for their
users. These aren't XSLT problems, but they are very real and
intractible issues that the consumers of your XSLT-generated html will
encounter if they are outside W Europe or the US, and XSLT authors
with i18n concerns have to face them.

One of the things XSLT is actually good at is mungeing xml into html
that suits the specific shortcomings of browsers that are out there
now, i.e for saving people from the consequences of ignoring or
mis-implementing standards. OK it would be better if that wasn't
necessary, but I guess it will be for a long time yet.

Michael Beddow

 XSL-List info and archive:

Current Thread