RE: Character entities

Subject: RE: Character entities
From: Mike Brown <mbrown@xxxxxxxxxxxxx>
Date: Mon, 14 Feb 2000 11:23:03 -0700
> Does HTML "know" UTF-8?

Like XML, HTML 4.0 is mostly defined in terms of UCS/Unicode characters,
which of course must be encoded. There is a mechanism for a document to
signal its own character encoding via a META declaration. This could be
overridden by a charset parameter in an HTTP Content-Type header.

Since HTML doesn't prescribe UTF-8 as a default and because the META
declaration can appear pretty far down in the document HEAD, the
recommendation states that only ASCII (U+0000 through U+007F) characters
should be used in the document up to that point.

This stuff is discussed at
http://www.w3.org/TR/1999/REC-html401-19991224/charset.html#spec-char-encodi
ng

It is worth pointing out that the value of the recommendation is only as
good as the user agents' support for it. The 4.0 browsers seem to do okay
with automatically selecting the proper encoding when interpreting a
document, but you may have noticed that they also let the user manually
choose it even if the document signaled its own encoding.


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread