Subject: RE: [xsl] Integrating a Search and Replace template with the CSV to XML converter From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Tue, 3 Jun 2008 08:43:30 +0100 |
> The characters that are effecting things are part of the > UNICODE set 'General Punctuation'. This is translating > through the stylesheet fine and is being displayed in the > resulting XML by ’ (right hand quote) and – (en > dash). Problem is, my dynamic website does not know how to > display these characters, and I am getting the little boxes instead. It's not surprising that it doesn't know how to display them, since neither of these codepoints is assigned to any printable Unicode character. The Unicode codepoint for en dash is x2013; the code for "right single quotation mark" is x2019. What has happened is that your input uses the Microsoft-proprietary cp1252 character encoding. There's no harm in that, provided that the software reading the file knows it's in this encoding, so that it can translate such characters to their proper Unicode values for use in the output XML. > > I am thinking of integrating a Global Search and Replace > template that runs on the final XML to find all instances of > ’ and replace with ' . No, you should fix the problem at source rather than patching it up later. If you're reading the CSV file using unparsed-text(), and if the CSV file is in cp1252 encoding, then you can specify this in the encoding parameter to unparsed-text(). Michael Kay http://www.saxonica.com/
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] Integrating a Search and Repl, Marney Cotterill | Thread | Re: [xsl] Integrating a Search and , Marney Cotterill |
[xsl] Integrating a Search and Repl, Marney Cotterill | Date | Re: [xsl] Integrating a Search and , Marney Cotterill |
Month |