Subject: Re: [xsl] Character Encoding Problem From: "Wolfgang Laun wolfgang.laun@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Thu, 25 Sep 2014 15:47:06 -0000 |
The lower bound of # regex="[Ā-]" should be set to € ( or À if you want to be finicky). Cheers -W On 25 September 2014 17:12, Tony Graham tgraham@xxxxxxxxxx < xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > On Thu, September 25, 2014 11:32 am, Tony Graham tgraham@xxxxxxxxxx wrote: > > On Tue, September 23, 2014 9:32 pm, Michael Kay mike@xxxxxxxxxxxx wrote: > >> On 23 Sep 2014, at 21:23, Craig Sampson craig.sampson@xxxxxxx > >> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > >>> We're trying to create a java properties file using XSLT 2.0 in > >>> Saxon. > >>> The input is XML encoded as UTF-8. The properties file needs to be > >>> encoded as ISO-8859-1. The character giving the problem, in the input > >>> file, is “ which is a left hand double quote. Looking at the > >>> ISO-8859-1 character set the closest character appears to be a double > >>> quote - with no hand (left/right). > > > > To move the goalposts, > > Since I inadvertently ended up repeating most of Wolfgang Laun's advice, > let me try again with something more original: > > ---- > <xsl:stylesheet > xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > xmlns:xs="http://www.w3.org/2001/XMLSchema" > xmlns:m="http://www.mentea.net/namespace" > version="2.0" > exclude-result-prefixes="m xs"> > > <xsl:output method="text" encoding="ISO-8859-1" /> > > <xsl:template match="text()"> > <xsl:analyze-string select="." > regex="[Ā-]"> > <xsl:matching-substring> > <xsl:value-of select="m:escape(.)" /> > </xsl:matching-substring> > <xsl:non-matching-substring> > <xsl:value-of select="." /> > </xsl:non-matching-substring> > </xsl:analyze-string> > </xsl:template> > > <xsl:function name="m:escape" as="xs:string"> > <xsl:param name="char" as="xs:string" /> > > <xsl:variable name="hex-chars" > select="m:to-hex(string-to-codepoints($char))" > as="xs:string+" /> > > <xsl:sequence > select="string-join(('\u', > substring('000', count($hex-chars)), > $hex-chars), > '')" /> > </xsl:function> > > <xsl:function name="m:to-hex" as="xs:string+"> > <xsl:param name="codepoint" as="xs:decimal" /> > > <xsl:sequence > select="if ($codepoint >= 16) > then m:to-hex(floor($codepoint div 16)) > else ()" /> > > <xsl:sequence select="substring('0123456789ABCDEF', > ($codepoint mod 16) + 1, 1)" /> > </xsl:function> > > </xsl:stylesheet> > ---- > > (though it does borrow from and correct > http://www.biglist.com/lists/xsl-list/archives/200012/msg00426.html). > > Regards, > > > Tony Graham tgraham@xxxxxxxxxx > Consultant http://www.mentea.net > Chair, Print and Page Layout Community Group @ W3C XML Guild member > -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- > Mentea XML, XSL-FO and XSLT consulting, training and programming
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Character Encoding Proble, Tony Graham tgraham@ | Thread | Re: [xsl] Character Encoding Proble, Wolfgang Laun wolfga |
Re: [xsl] Character Encoding Proble, Tony Graham tgraham@ | Date | Re: [xsl] Character Encoding Proble, Wolfgang Laun wolfga |
Month |