[xsl] Asymmetric string handling with processing-instructions

Subject: [xsl] Asymmetric string handling with processing-instructions
From: "Michael Mueller-Hillebrand michael.mueller-hillebrand@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 14 Sep 2023 14:05:32 -0000
Dear colleagues,

We are (finally) switching to Saxon 10 (shame on us for being so late) and
experience the breaking change in the very welcome extension function
saxon:get-pseudo-attribute(). The change history states:


B7        The extension function saxon:get-pseudo-attribute() now parses the
supplied input much more rigorously, applying the rules found in the W3C
specification<https://www.w3.org/TR/2010/REC-xml-stylesheet-20101028/>, and
raising an error for invalid syntax that was previously allowed through.

Check out this example which changes <p> into PI and vice versa:
XML:
<section>
    <p>Marks &amp; Spencer Text</p>
    <?my value="Marks &amp; Spencer PI"?>
</section>

XSLT:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="3.0"
    xmlns:xs="http://www.w3.org/2001/XMLSchema";
    xmlns:saxon="http://saxon.sf.net/";
    exclude-result-prefixes="#all">

    <xsl:mode on-no-match="shallow-copy"/>
    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="p">
        <xsl:processing-instruction name="my" select="'value=&quot;' ||
string(.) || '&quot;'"/>
    </xsl:template>

    <xsl:template match="processing-instruction(my)">
        <p>
            <xsl:text>String: </xsl:text>
            <xsl:value-of select="string(.)"/>
        </p>
        <p>
            <xsl:text>s:g-p-a: </xsl:text>
            <xsl:value-of select="./saxon:get-pseudo-attribute('value')"/>
        </p>
    </xsl:template>
</xsl:stylesheet>

The result with Saxon 10+ is:
<section>
   <?my value="Marks & Spencer Text"?>
   <p>String: value="Marks &amp;amp; Spencer PI"</p>
   <p>s:g-p-a: Marks &amp; Spencer PI</p>
</section>

The PI in the result shows the part of the spec which says: bNote that
special characters occurring within the PI text will not be escaped.b

My bottom line: If you want to use saxon:get-pseudo-attribute(), because it is
elegant and efficient, and it could be possible you have user content in
processing instruction, you have two additional tasks:
* When using xsl:processing-instruction or other ways to create processing
instructions, make sure to escape the five XML characters
* When accessing PI string values without saxon:get-pseudo-attribute, add an
unescaping routine to avoid double escaped content.

How do you deal with this asymmetry?

Best regards
- Michael


Michael MC<ller-Hillebrand
Senior Consultant
Phone +49 951-20859-752
Mobil +49 172-819 34 13
michael.mueller-hillebrand@xxxxxxxxx<mailto:michael.mueller-hillebrand@docufy
.de>
www.docufy.de<https://www.docufy.de/> |
DOCUFY@LinkedIN<https://www.linkedin.com/company/3845358/>
Datenschutz<https://www.docufy.de/datenschutz/>
DOCUFY GmbH | KirschC$ckerstr. 27 | 96052 Bamberg | Deutschland
CEO: Nadine Prill | Amtsgericht Bamberg HRB10571

Current Thread