Subject: Re: [xsl] special characters in xml text paramter From: Mike Brown <mike@xxxxxxxx> Date: Wed, 20 Nov 2002 14:59:45 -0700 (MST) |
Alice Fan wrote: > but how do i convince my browser? It doesn't have anything to do with > browser versions right? DECLARING THE ENCODING OF AN HTML DOCUMENT Remember, a document is just a bunch of bytes when the browser (the HTML user agent, to be more general) reads it off the network or disk. The browser has to figure out how those bytes map to characters: the encoding. You are supposed to tell the browser what the HTML document's encoding is by putting that info in the "charset" parameter of the Content-Type header of the HTTP response message that delivers the HTML. You could control this through whatever mechanism your HTTP server offers for doing so. HTML also provides a facility for embedding the same info in the HTML document itself, and this is generally what most people do, rather than messing with the HTTP server. In the document head, right after the title, they put: <meta http-equiv="Content-Type: text/html;charset=utf-8"> If you are using the HTML output method in your XSLT processor, then it normally (although this is not a requirement) will add the meta tag to the document head for you. If it's not doing this, then add it yourself, via your stylesheet. Of course, your browser has to be smart enough to honor this info, and you must not do anything to your browser to override its ability to do so. They often do let you override their behavior, so that you can correctly view a document that has a misdeclared or undeclared encoding. For example, many iso-2022-jp documents are served up as if they were iso-8859-1, so Japanese users keep their browsers set to ignore the declared encoding and always use iso-2022-jp instead. One thing you may have forgotten to do is tell the XSLT processor your desired output encoding. For example, in your stylesheet, <xsl:output method="html" encoding="iso-8859-1"/> would give you iso-8859-1 encoded output, where there is just 1 byte per character. With this particular encoding, characters above the first 256 bytes of Unicode are not representable directly as bytes, so they will be emitted by the XSLT processor as character entity references ("©") or numeric character references ("©"). Depending on the XSLT processor, the upper 128 of that 256 may be emitted character entity references, in order to retain compatibility with Netscape 4.x, which is horribly nonconformant in its handling of single-byte document encodings. Generally, it is safe to use utf-8 as the output encoding. It gives you the full range of Unicode directly as 1 to 4 bytes per character, obviating the need for character references or entity references. As long as you declare the charset in the meta tag or in the transport, and the browser is not completely brain-dead, the document's bytes will be decoded correctly. The same cannot be said for your generic text editor. - Mike ____________________________________________________________________________ mike j. brown | xml/xslt: http://skew.org/xml/ denver/boulder, colorado, usa | resume: http://skew.org/~mike/resume/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] special characters in xml, Jeff Kenton | Thread | Re: [xsl] special characters in xml, Alice Fan |
Re: [xsl] How do you select all uni, Greg Faron | Date | Re: [xsl] Split XML and output to d, Mike Brown |
Month |