Re: [xsl] encoding and XSL Transformation

Subject: Re: [xsl] encoding and XSL Transformation
From: Chuck White <chuckwh@xxxxxxxxxxx>
Date: Wed, 11 Sep 2002 11:39:26 -0700
----- Original Message -----
From: "David Carlisle" <davidc@xxxxxxxxx>


> the XML Notation &#146; _always_ dentotes the Unicode character 146
> whatever the encoding of the file.
>

Right, but we're not talking about output, we're talking about input.

In HTML, the following document fragment:

<html>
<head>
<title>Untitled Document</title>
</head>

<body bgcolor="#FFFFFF" text="#000000">
&#146;
</body>
</html>

renders in a Windows-based browser as a single quote. However, it renders in
Macromedia Ultradev as i acute on a Macintosh, and on Netscape - Mac as a
single quote. This is why people get confused. &#146; is &#146; is &#146;,
but a user has no control over how various implementations handle it.

If a user receives a document with a single quote and somehow manages to
convert it to &#146;, they're going to wonder why it renders as a single
outlined empty box or as a question mark or some other character when
inputting it, then transforming it to XML using utf-8. Then, when they
change the encoding in the XSLT document to us-ascii, they again see it
rendering to a single quote (maybe).

The original question was "Does anyone know how to get the Xalan parser to
properly transform these characters to their proper hex value?"

The lofty answer is to say that XML is a sequence of UCS characters, and
that's that, that there is simply no other answer, and that they're already
transformed to their proper hex value. Of course that's true, but it doesn't
help users understand what's going on. If someone receives a document with
&#146; and thinks they should be getting a single right quote, the reason
isn't because of a lack of intellectual capacity on the part of the user,
it's because software developers keep changing the rules.

Unicode has solved the world's problems, but it hasn't removed legacy
software from users' systems yet. The bottom line on the original poster's
question is that before the document is brought in as an XML document he
needs to convert the single quote to the Unicode representation for that
single quote, which is not \u0092 or &#146;., but for XML purposes, either
&#x2019; in hex or &#8217 in decimal format, hence the need for the kind of
link I indicated that references this kind of thing, or for apps like
Unipad.


Cheers,

Charles White
The Tumeric Partnership
http://www.tumeric.net
chuck@xxxxxxxxxxx
http://www.javertising.com
________________________________________
Author, Mastering XSLT, Sybex Books
Co-Author, Mastering XML, Premium Edition, Sybex Books


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread