Re: [xsl] Recognized Unicode characters?

Subject: Re: [xsl] Recognized Unicode characters?
From: Geert Josten <Geert.Josten@xxxxxxxxxxx>
Date: Mon, 09 May 2005 15:48:46 +0200
Hi,

Maybe your default font of your browser doesn't support the character you are trying to see. I cannot reproduce the problem. Using output method HTML, the XSL processor (Xalan) converts the 8212 to &mdash; when writing us-ascii and some utf-8 byte sequence when writing utf-8. I see either garbage (when it is utf-8 and there is no meta tag specifying the encoding) or just the character you are looking for. I saw the square box in none of the cases I tested...

Cheers,
Geert

Thanks for responding, but I think you guys lost me.
Here is the xslt header info I used:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
<xsl:output method="html"/>


I set output to HTML because that is the output I am creating. (isn't this right?)

As for the encoding, I have to admit I am confused. I picked UTF-8 mostly due to general recommendations for its use in learning-xml books and websites (and that it is the default), but none I have seen explain why with any detail or why anyone might use something different. The special characters in my source xml file are all character references to the Unicode numbers (&#___;, etc.)

As I understand it, shouldn't the XSLT processor know from the "encoding" attribute that the references will be to Unicode numbers and read them correctly as those characters. I also understand that the processor has some flexibility in how it outputs the text, but that it will often output special characters as entity references (e.g., the "&" symbol as "&amp;").

So, I am still confused why a Unicode reference to #8212 won't output correctly? The ouput displays a square box in both the browser (IE6) as well as in the HTML source itself (viewed via Windows notepad).

> > > Shouldn't that be <xsl:output encoding="US-ASCII"... for safety?
> >
> > Neither is completely safe of course,
>

The spec only requires support for UTF-8 and UTF-16, anything else is
optional.

I personally use "iso-646" as the name of this encoding. The differences are
immaterial (different names for some of the characters, I believe) but I
prefer international standards as a matter of principle.


Michael Kay
http://www.saxonica.com/




-- ===================================== NB: het Daidalos kantoor is sinds 22 april jl. gevestigd op een nieuw adres:

Daidalos BV
Hoekeindsehof 1 - 4
2665 JZ Bleiswijk
tel: +31 (0)10 850 12 00
fax: +31 (0)10 850 11 99

Bovenstaand adres is tevens het postadres.
======================
Geert.Josten@xxxxxxxxxxx
IT-consultant at Daidalos BV

http://www.daidalos.nl/

GPG: 1024D/12DEBB50

Current Thread