Re: Character references, entities, XSL and cocoon

Subject: Re: Character references, entities, XSL and cocoon
From: Elliotte Rusty Harold <elharo@xxxxxxxxxxxxxxx>
Date: Fri, 10 Sep 1999 07:52:33 -0400
>Cross -posted to xml-l.  Please excuse the duplication.
>
>Hello colleagues,
>
>I'm creating an xml version of an art theory scholarly manuscript that
>includes ancient greek characters (with breathing marks, accents, etc.)
>I've run into some problems and would appreciate any help you could provide.
>I decided to use unicode character references for the ancient greek
>characters.  With IE5 (newly equipped with the Athena font) the characters
>were successfully rendered on my screen using CSS (question 1 -- although
>they would not print! why?).  However, I need to make this project
>accessible to a broader audience than IE5 users, so I've begun work with
>Cocoon, an Apache/Jserv servlet that will transform my XML into HTML using
>XSLT.
>
>Okay so far, but the character references in my xml document show up in the
>transformed HTML document as entity references, not rendered greek.  (Some
>character references show up as question marks -- is this the parser or
>processor not able to recognize less common unicode characters?)  Anyway,
>I'd very much appreciate help in understanding what's going on, and
>information about how I can pass my XML character references to the
>transformed HTML document.
>

I've encountered this myself. For instance see
http://metalab.unc.edu/xml/books/bible/errata/05.html for just another
example of the problem.  The issue is that although HTML 4.0 defines many
entities like &Omega; for capital Greek omega, browsers generally don't yet
support these entity references. There's not a lot you can do about this in
the general case. For an occasional word or quotation, I just use the
references any way and hope that readers will understand. For a longer
passage, you can try using an output encoding like UTF-8 or 8859-7 that
actually includes the characters you want. Then you'd put a META tag in
your header like this:

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-7">

Not all browsers will pick this up, or be able to display the write
character set even if they do recognize it; but some will.


+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@xxxxxxxxxxxxxxx | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|                  The XML Bible (IDG Books, 1999)                   |
|              http://metalab.unc.edu/xml/books/bible/               |
|   http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://metalab.unc.edu/javafaq/ |
|  Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/     |
+----------------------------------+---------------------------------+



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread