Re: [xsl] Global parameters with UTF-8 characters and ???s <Disregard Previous>

Subject: Re: [xsl] Global parameters with UTF-8 characters and ???s <Disregard Previous>
From: David Carlisle <davidc@xxxxxxxxx>
Date: Thu, 3 Aug 2006 10:32:24 +0100
> Setting the output encoding to "US-ASCII" works.  I no longer see the
> question marks.
> Is this the right solution?  Or does it just point out what the issue
> is?

Most (but not all) encodings used today encode 0-9, a-z, A-Z using the
same code points so if you ensure that your file only has these (and
some punctuation) then most of the time the file will work no matter
what encoding it is specified as having.  If I'm generating files that
other people may put on web servers I usually try to always use us-ascii
as the encoding (and if using xslt2 then omit the encoding declaration
so it will be taken as utf8 (which is also correct as ascii files are
also valid utf8 files). If your file uses non-ascii characters then you
need to declare the correct encoding. Most likely your files were in
utf8 but your web server was declaring tehm to be iso-8859-1 (you can
check that by looking in your browser (view/character encoding) in
firefox, something similar in IE. If the file is displaying incorrectly
but manually using the encoding menu to change the encoding makes the
fiel display correctly then it's almost certainly the fact that the
server is specifying the wrong encoding to the browser.(Most web servers
do _not_ look at the file to determine what encoding to specify, they
just use a site or directory default encoding for that file type).

specifying US-ASCII is a good solution for some kinds of files in some
work scenarios, but not all.

* when it works, it works, and is very simple to do.


* the encoding is rather inefficient, an e-acute is one byte in
 iso-8859-1, 2 bytes in utf8 but at least 6 (&#xe9;) bytes in
 us-ascii. So if your file is English with the occasional non-breaking
 space or currency symbol, it's not too bad, to have the occasional
 character encoded this way, but in some langauges your file is 5 or 6
 times larger

* You can not use the mechanism at all if non-ascii characters are used
  in places where the &# notation is not available, so if any such
  characters appear in comments, or in processing instructions, or in
  element or attribute names, this is not an option at all.


Current Thread