RE: [xsl] Problem: XSLT, attribute value, Unicode supplementary characters

Subject: RE: [xsl] Problem: XSLT, attribute value, Unicode supplementary characters
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Sun, 18 Apr 2010 10:09:20 +0100
Check that you are not using the XML parser built-in to JDK 1.6: use the
Xerces parser from Apache. The JDK 1.6 parser has some nasty bugs, and often
corrupts attribute values.

Regards,

Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay

> -----Original Message-----
> From: Kenneth Reid Beesley [mailto:krbeesley@xxxxxxxxx]
> Sent: 18 April 2010 06:25
> To: xslt
> Subject: [xsl] Problem: XSLT, attribute value, Unicode
> supplementary characters
>
>
> I've got a problem with XSLT transformation of attribute
> values consisting of Unicode supplementary characters.
>
> Background:
>
> 1.  OS X  10.6.3
> 2.  saxonhe9-2-0-6j
> 3.  The task:  transforming an XML document into XeTeX
> (specifying <xsl:output method="text" encoding="UTF-8"/> ) 4.
>  The XML document is well-formed and also validates against a
> Relax NG schema.
> 5.  The XML document is designated as <?xml version="1.0"
> encoding="UTF-8"?> 6.  The locale of the operating system is UTF-8
>
>
> Typical Data:
>
> XML:      <case correctda="pp0p;">pp0p;</case>
>
> The value of the attribute named correctda is here a short
> string of three Deseret Alphabet letters, from the Unicode
> supplementary area.
>
> Matching XSLT template:
>
> <xsl:template match="pleft/text/case">{\da <xsl:value-of
> select="@correctda"/>\endnote{\rom Case correction: {\da
> <xsl:value-of select="."/>} $\rightarrow$ {\da <xsl:value-of
> select="@correctda"/>}}}</xsl:template>
>
> Behavior:
>
> 1.  The key problem is the output of the attribute value, via
> <xsl:value-of select="@correctda"/>.  Instead of outputting
> the value pp0p;, as expected, the output is instead a long
> string of unrelated Deseret Alphabet characters.   It's as if
> the value-of function is being confused by the Unicode
> supplementary characters.
>
> 2.  This XSLT script was working a couple of months ago.
> Since then, I did upgrade to OS X 10.6 (Snowleopard), and in
> trying to fix the current problem, I upgraded to
> saxonhe9-2-0-6j as well.  The problem persists.
>
> Question:
>
> Does anyone know what's happening and how I can fix it?  Has
> something changed in the handling of Unicode supplementary characters?
>
> Thanks,
>
> Ken
>
>
> ******************************
> Kenneth R. Beesley, D.Phil.
> P.O. Box 540475
> North Salt Lake, UT
> 84054  USA

Current Thread