Re: [xsl] Character Encoding Problem

Subject: Re: [xsl] Character Encoding Problem
From: "Wolfgang Laun wolfgang.laun@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 25 Sep 2014 15:52:32 -0000
Forget that, it's OK as it is, since the output is ISO-8859-1.
-W

On 25 September 2014 17:46, Wolfgang Laun <wolfgang.laun@xxxxxxxxx> wrote:

> The lower bound of #
>    regex="[&#x100;-&#x10FFFF;]"
> should be set to &#x80; ( or &#xC0; if you want to be finicky).
> Cheers
> -W
>
>
> On 25 September 2014 17:12, Tony Graham tgraham@xxxxxxxxxx <
> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>> On Thu, September 25, 2014 11:32 am, Tony Graham tgraham@xxxxxxxxxx
>> wrote:
>> > On Tue, September 23, 2014 9:32 pm, Michael Kay mike@xxxxxxxxxxxx
>> wrote:
>> >> On 23 Sep 2014, at 21:23, Craig Sampson craig.sampson@xxxxxxx
>> >> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>> >>>   We're trying to create a java properties file using XSLT 2.0 in
>> >>> Saxon.
>> >>> The input is XML encoded as UTF-8. The properties file needs to be
>> >>> encoded as ISO-8859-1. The character giving the problem, in the input
>> >>> file, is &#x201c; which is a left hand double quote. Looking at the
>> >>> ISO-8859-1 character set the closest character appears to be a double
>> >>> quote - with no hand (left/right).
>> >
>> > To move the goalposts,
>>
>> Since I inadvertently ended up repeating most of Wolfgang Laun's advice,
>> let me try again with something more original:
>>
>> ----
>> <xsl:stylesheet
>>     xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>>     xmlns:xs="http://www.w3.org/2001/XMLSchema";
>>     xmlns:m="http://www.mentea.net/namespace";
>>     version="2.0"
>>     exclude-result-prefixes="m xs">
>>
>> <xsl:output method="text" encoding="ISO-8859-1" />
>>
>> <xsl:template match="text()">
>>   <xsl:analyze-string select="."
>>                       regex="[&#x100;-&#x10FFFF;]">
>>     <xsl:matching-substring>
>>       <xsl:value-of select="m:escape(.)" />
>>     </xsl:matching-substring>
>>     <xsl:non-matching-substring>
>>       <xsl:value-of select="." />
>>     </xsl:non-matching-substring>
>>   </xsl:analyze-string>
>> </xsl:template>
>>
>> <xsl:function name="m:escape" as="xs:string">
>>   <xsl:param name="char" as="xs:string" />
>>
>>   <xsl:variable name="hex-chars"
>>                 select="m:to-hex(string-to-codepoints($char))"
>>                 as="xs:string+" />
>>
>>   <xsl:sequence
>>       select="string-join(('\u',
>>                            substring('000', count($hex-chars)),
>>                            $hex-chars),
>>                           '')" />
>> </xsl:function>
>>
>> <xsl:function name="m:to-hex" as="xs:string+">
>>   <xsl:param name="codepoint" as="xs:decimal" />
>>
>>   <xsl:sequence
>>       select="if ($codepoint >= 16)
>>                 then m:to-hex(floor($codepoint div 16))
>>               else ()" />
>>
>>    <xsl:sequence select="substring('0123456789ABCDEF',
>>                                    ($codepoint mod 16) + 1, 1)" />
>> </xsl:function>
>>
>> </xsl:stylesheet>
>> ----
>>
>> (though it does borrow from and correct
>> http://www.biglist.com/lists/xsl-list/archives/200012/msg00426.html).
>>
>> Regards,
>>
>>
>> Tony Graham                                         tgraham@xxxxxxxxxx
>> Consultant                                       http://www.mentea.net
>> Chair, Print and Page Layout Community Group @ W3C    XML Guild member
>>   --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --
>> Mentea       XML, XSL-FO and XSLT consulting, training and programming

Current Thread