RE: [xsl] Integrating a Search and Replace template with the CSV to XML converter

Subject: RE: [xsl] Integrating a Search and Replace template with the CSV to XML converter
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Tue, 3 Jun 2008 08:43:30 +0100
> The characters that are effecting things are part of the 
> UNICODE set 'General Punctuation'. This is translating 
> through the stylesheet fine and is being displayed in the 
> resulting XML by &#146; (right hand quote) and &#150; (en 
> dash). Problem is, my dynamic website does not know how to 
> display these characters, and I am getting the little boxes instead.

It's not surprising that it doesn't know how to display them, since neither
of these codepoints is assigned to any printable Unicode character. The
Unicode codepoint for en dash is x2013; the code for "right single quotation
mark" is x2019. 

What has happened is that your input uses the Microsoft-proprietary cp1252
character encoding. There's no harm in that, provided that the software
reading the file knows it's in this encoding, so that it can translate such
characters to their proper Unicode values for use in the output XML.
> 
> I am thinking of integrating a Global Search and Replace 
> template that runs on the final XML to find all instances of 
> &#146; and replace with ' .

No, you should fix the problem at source rather than patching it up later.
If you're reading the CSV file using unparsed-text(), and if the CSV file is
in cp1252 encoding, then you can specify this in the encoding parameter to
unparsed-text().

Michael Kay
http://www.saxonica.com/

Current Thread