Re: [xsl] I18N / UTF-8 versus US-ASCII

Subject: Re: [xsl] I18N / UTF-8 versus US-ASCII
From: David Carlisle <davidc@xxxxxxxxx>
Date: Tue, 4 Apr 2006 13:13:28 +0100
> It would be interesting to know if anyone who was using US-ASCII
> output had to switch to a broader encoding because of some issue....

I also tend do use US-ASCII for anything that might go near a web server
although I make sure to use omit-xml-encoding in that case as well so
that the files are well formed as UTF-8 documents and parsable by all
XML systems. An XML system may (and if i recall correctly, some older
ones, eg IE6 if running under windows 2000 in some configurations, did)
issue a fatal "unknown encoding" error if presented with a file saying
<?xml version="1.0" encoding="US-ASCII"?>
even though the same file would parse correctly if this line were
removed.

Of course the other cases where you can not use a restricted encoding
are cases where the element or attribute names use non-ascii characters.

> The only disadvantage I'm aware of is that anyone reading the file is
> presented with the numeric character references instead of the
> characters themselves - this is not normally a problem as the only
> people who ever examine the XML itself are developers, users only see
> the parsed content (at which point all the references have been
> resolved).

file size can be an issue. People have long arguments about whether the
(typically) one or two or three bytes per character cost of utf8 is
better or worse than the typically 2 or 4 bytes per character of utf16.
Anyone who thinks that is an issue worthy of consideration isn't going
to like the idea of using character references.
Encoding characters as &#x1234; is 8 bytes per character (or 9 if you
need higher planes) For some document types and languages and document
distribution methods making the file 4 times bigger than it would be in
utf16 isn't really an option.

David

________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

Current Thread