Subject: Re: [xsl] encoding and XSL Transformation From: Chuck White <chuckwh@xxxxxxxxxxx> Date: Wed, 11 Sep 2002 11:39:26 -0700 |
----- Original Message ----- From: "David Carlisle" <davidc@xxxxxxxxx> > the XML Notation ’ _always_ dentotes the Unicode character 146 > whatever the encoding of the file. > Right, but we're not talking about output, we're talking about input. In HTML, the following document fragment: <html> <head> <title>Untitled Document</title> </head> <body bgcolor="#FFFFFF" text="#000000"> ’ </body> </html> renders in a Windows-based browser as a single quote. However, it renders in Macromedia Ultradev as i acute on a Macintosh, and on Netscape - Mac as a single quote. This is why people get confused. ’ is ’ is ’, but a user has no control over how various implementations handle it. If a user receives a document with a single quote and somehow manages to convert it to ’, they're going to wonder why it renders as a single outlined empty box or as a question mark or some other character when inputting it, then transforming it to XML using utf-8. Then, when they change the encoding in the XSLT document to us-ascii, they again see it rendering to a single quote (maybe). The original question was "Does anyone know how to get the Xalan parser to properly transform these characters to their proper hex value?" The lofty answer is to say that XML is a sequence of UCS characters, and that's that, that there is simply no other answer, and that they're already transformed to their proper hex value. Of course that's true, but it doesn't help users understand what's going on. If someone receives a document with ’ and thinks they should be getting a single right quote, the reason isn't because of a lack of intellectual capacity on the part of the user, it's because software developers keep changing the rules. Unicode has solved the world's problems, but it hasn't removed legacy software from users' systems yet. The bottom line on the original poster's question is that before the document is brought in as an XML document he needs to convert the single quote to the Unicode representation for that single quote, which is not \u0092 or ’., but for XML purposes, either ’ in hex or ’ in decimal format, hence the need for the kind of link I indicated that references this kind of thing, or for apps like Unipad. Cheers, Charles White The Tumeric Partnership http://www.tumeric.net chuck@xxxxxxxxxxx http://www.javertising.com ________________________________________ Author, Mastering XSLT, Sybex Books Co-Author, Mastering XML, Premium Edition, Sybex Books XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] encoding and XSL Transfor, David Carlisle | Thread | Re: [xsl] encoding and XSL Transfor, Mike Brown |
RE: [xsl] What is the probelem with, Marrow | Date | Re: [xsl] Avoid outputting newlines, J.Pietschmann |
Month |