Re: [xsl] Combining use-character-maps and normalization-form="NFC" attributes produce unwanted output

Subject: Re: [xsl] Combining use-character-maps and normalization-form="NFC" attributes produce unwanted output
From: "lancelot.meurillon@xxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 16 Feb 2016 16:43:36 -0000
Thanks Wolfgang.
I raised an issue => https://saxonica.plan.io/issues/2622

Lancelot

From: Wolfgang Laun wolfgang.laun@xxxxxxxxx
[mailto:xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx]
Sent: vendredi 12 fC)vrier 2016 16:42
To: xsl-list
Subject: Re: [xsl] Combining use-character-maps and normalization-form="NFC"
attributes produce unwanted output

Even the solitary identity transformation of the semicolon 0x3B
     <xsl:output-character character=";" string=";"/>
results in a translation to U+037E of all semicolons. Seems to be a bug.

 SaxonHE 9.6.0.1

On 12 February 2016 at 15:29,
lancelot.meurillon@xxxxxxxx<mailto:lancelot.meurillon@xxxxxxxx>
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx<mailto:xsl-list-service@xxxxxxxxxxxx
rytech.com>> wrote:
XSL processor : Saxon-EE 9.5.1.8J from Saxonica

XSL version : 2.0



Dear all,



For some reasons, I need to escape specific characters in the output and also
need to produce normalised Unicode in NFC.

Here is my input :

<inputText>b; ;</ inputText >  => which is \u201D + \u003B + \u0020 + \u003B



Here is the output properties of my stylesheet :

<xsl:output method="xml" version="1.0" encoding="UTF-8"

        indent="yes" omit-xml-declaration="no"

        use-character-maps="unsupported_characters"

        normalization-form="NFC"

    />



The character-map definition :

<xsl:character-map name="unsupported_characters">

        <xsl:output-character character="&#8220;" string="&quot;"/>

        <xsl:output-character character="&#8221;" string="&quot;"/>

    </xsl:character-map>



With this template :

<xsl:template match="/ ">

    <shortDescription><xsl:value-of select=" inputText "/></shortDescription>

</xsl:template>



Now the output :

<shortDescription>"M> ;</shortDescription> => which is \u0022 + \u037E +
\u0020 + \u003B



Why the semicolon (\u003B) is translated into Greek question mark (\u037E)
just after the escaped quote while the next semi colon is kept ?

But the right question is why my semicolon is escaped into Greek question mark
?



Just to go further :

1- If I do not use character-map the result is :

<shortDescription>b; ;</shortDescription> => which is \u201D + \u003B +
\u0020 + \u003B



2- If I do not normalize the Unicode (without normalization-form="NFC"
attribute)

<shortDescription>"; ;</shortDescription> => which is \u0022 + \u003B + \u0020
+ \u003B



Thanks for the help

Lancelot










XSL-List info and archive<http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe<-list/2831320> (by email<>)

Current Thread