Re: [xsl] UTF-8, RTF and XSLT

Subject: Re: [xsl] UTF-8, RTF and XSLT
From: Russell Kohn <russ@xxxxxxxxxxxx>
Date: Fri, 8 Nov 2002 07:53:26 -0800
At 9:27 AM +0000 11/8/02, David Carlisle wrote:
RTF isn't XML so you woukd be better using text output than xml, also if
you don't want non ascii characters encoded in utf8 then specify a
different encoding eg latin1, so...

<xsl:output method="xml" encoding="iso-8859-1"/>

Hi David,


Yes. However, I think I may have been unclear before, so let me try again...


My source XML looks something like this (i'm simplifying here):


<?xml version="1.0" encoding=-"UTF-8" ?>
<resultset>
<data>theData</data>
<data>someMoredata</data>
<data>youGetTheIdea</data>
</resultset>

Now, let's assume theData contains the single &Aring; (&#197) character, which should render as a capital A with a ring over the top.

If I open the raw XML file in a browser or other application that can read UTF-8 natively, this renders fine. If I open it in a pure text editor, then theData appears as two arbitrary characters and not a single &Aring;.

I do not have control over the raw XML file, so I can't change the way theData is encoded on the way in.

OK, now my XSLT file looks something like this (again simplifying):

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; xmlns:myStuff="myURL" exclude-result-prefixes="myStuff"/>
<xsl:output method="text" version="1.0" encoding="theEncoding" indent="yes" omit-xml-declaration="yes"/>


<xsl:template match="myStuff:resultset">
<xsl:for-each select="mystuff:data">
   <xsl:text>a bunch of stuff</xsl:text>
   <xsl:value-of select="myStuff:data"/>
   <xsl:text>some more stuff</xsl:text>
</xsl:for-each>
</xsl:template>

</xsl:stylesheet>


I have tried setting theEndoding to various static values such as "UTF-8",
"iso-8859-1", "ISO-8859-1", other iso-8859-x variants and some other values. My xslt creates rich text files that need to be opened by various RTF readers. In every case I've tried, theData comes across as UTF-8 encoding, and is not coerced into a different encoding.


If I open my final output file in a plain text editor, I'll see the same arbitrary characters I had originally. If I open the output in a UTF-8 capable reader, the characters do render properly. Since most RTF readers are not going to read UTF-8, how can I get myData to convert from UTF-8 to something more digestible.

Since I'm creating Rich Text Format output, I would be happy to solve this problem within the rtf side of things; however, I'd much prefer to solve it by fixing my xslt stylesheet if possible.

TIA,

- Russ



Russell Kohn , Chaparral Software & Consulting Services Inc.
Calabasas, California -  http://www.chapsoft.com - 818.225.1247

XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list


Current Thread