Subject: Re: [xsl] Re: Character 150 withs Windows-1252 output From: "andrew welch" <andrew.j.welch@xxxxxxxxx> Date: Fri, 21 Apr 2006 13:20:17 +0100 |
On 4/21/06, Michael Kay <mike@xxxxxxxxxxxx> wrote: > > > Reading around a bit 150 is a control character... so does > > that mean it shouldn't appear in source XML document > > (unresolved) where the encoding is specified as ISO-8859-1 ?? > > I believe that in the ISO standard ISO 8859/1, the control blocks C0 and C1 > (which includes 150) are unused - they are not part of the character set. > However, according to Wikipedia [1], "the character map ISO_8859-1:1987, > more commonly known by its preferred MIME name of ISO-8859-1 ... assigns the > C0 and C1 control characters to the code values 00-1F, 7F, and 80-9F. > > The XML recommendation defines encodings in terms of their IANA definitions > not their ISO definitions, so on that basis ISO-8859-1 does include the > control character 150. > > In XML 1.1, there is a requirement that C0 and C1 characters (with obvious > exceptions such as TAB) must be represented as character references. This is > primarily to catch the common error where a Windows 1252 file is mislabelled > as ISO-8859-1. > > [1] http://en.wikipedia.org/wiki/ISO_8859-1 Thanks for the info. Based on that, given this stylesheet: <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output encoding="ISO-8859-1" method="xml"/> <xsl:template match="/"> <foo>–</foo> </xsl:template> </xsl:stylesheet> The output differs between MSXML 3/4, Saxon 6.5.4 and Saxon 8.7.1. The latter escapes the character back to #150, while the 3 xslt 1.0 processors all output the character itself. I'm guessing this is due to xml 1.1 support in Saxon 8.7?
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Re: Character 150 withs W, Michael Kay | Thread | RE: [xsl] Re: Character 150 withs W, Michael Kay |
RE: [xsl] Phonetic Sorting for Japa, Michael Kay | Date | [xsl] lookup table problem, Alexander . RACHER |
Month |