[xsl] XSLT 1.0 serializer for XML

Subject: [xsl] XSLT 1.0 serializer for XML
From: Hermann Stamm-Wilbrandt <STAMMW@xxxxxxxxxx>
Date: Thu, 19 Aug 2010 16:26:27 +0200
Hello,

in another thread Florent posted [1] a link to a XML serializer written
in pure XSLT [2].
That serializer is written in XSLT 2.0 and cannot be used in browsers;
they do support XSLT 1.0 only.

With Michael's help I got the differentiation of the 6 XML node types
(text, comment, processing-instruction, element, attribute and namespace)
right [3] and was able to output "readable" XML, even for attributes and
namespaces.

For another tool I modified that to generate HTML output, and by reading
Florent's posting I realized that it already did XML serialization.

I extracted the serializer which you can find below and under [4].
There is also a XML file online [5] demonstrating the new features:
It serializes and displays its own content, the demo XSLT and serialize.xsl
with some comments and links -- try it out!


In trying to verify correct behavior I used serialize-test.xsl [6] and
compared its output displayed in a browser with the output of
<xsl:copy-of select="/"/>. This was really useful for handling of
characters that needed to be escaped from CDATA sections [7].


Question 1:
While '<' and '&' must be escaped, '>' must not. But the output of
<xsl:copy-of select="/"/> does escape the '>', too.
This was the reason for template escapeLtGtAmp to escape all three in
order to match the copy-of behavior.
Why does xsl:copy-of escape '>'?


Question 2:
The displayed output looks quite nice for Firefox, Chrome, Safari and Opera
browsers (Firefox does not support the namespace:: axis and cannot handle
and display namespaces).
Why is the serialized XML displayed by IE6 and IE8 looking completely
different to all the other browsers (ugly)?


Question 3:
Is it correct, that a stylesheet cannot have access to the CDATA sections?
(I think the parser removes them)


Question 4:
Is it correct, that a stylesheet cannot access the "original" attribute
values (including eg. newlines) but only the result of Attribute-Value
Normalization [8]?


[1]
http://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/201008/msg00181.html
[2] http://code.google.com/p/xlibs/source/browse/serial/trunk
[3]
http://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/201008/msg00161.html
[4] http://stamm-wilbrandt.de/en/xsl-list/serialize/serialize.xsl
[5] http://stamm-wilbrandt.de/en/xsl-list/serialize/serialize-demo.xml
[6] http://stamm-wilbrandt.de/en/xsl-list/serialize/serialize-test.xsl
[7] http://www.w3.org/TR/REC-xml/#syntax
[8] http://www.w3.org/TR/REC-xml/#AVNormalize

<!--
     XSLT 1.0 serializer for XML

     remarks:
     - generates output nearly identical to <xsl:copy-of select="/"/>
       - all attributes before namespace declarations
       - attribute values might be different because of AVN

     - since stylesheet does not have access to CDATA sections
       it has to use template escapeLtGtAmp to ensure correct
       escaping; overhead of 1 x call-template + 3 x contains()
       for text output not containing any of &lt; , &gt; and &amp;

     - because of "Attribute-Value Normalization" no newlines in
       attribute values; this might change visual presentation
       as can be seen in first <xsl:when>'s test attribute

     - entity references like &#10; and &quot; in the XML file are not
       accessible by the stylesheet and are displayed as non-Entity


          serialize.xsl: XML serializer

     serialize-demo.xml: demonstration file (open in browser)
     serialize-demo.xsl: referenced demonstration

            copy-of.xsl: for comparison with serialize-test.xsl output
     serialize-test.xsl: for comparison with copy-of.xsl output;
                         view output in browser for comparing
-->
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>
  <xsl:template name="doOutput">
    <xsl:choose>
      <xsl:when test="count(. | ../namespace::*) !=
                      count(../namespace::*)">
        <xsl:apply-templates select="." mode="output"/>
      </xsl:when>

      <xsl:otherwise>
        <xsl:value-of select= "concat('xmlns',
                                       substring(':',1 div boolean(name
())),
                                       name(),'=&quot;',.,'&quot;')" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <xsl:template match="@*" mode="output">
    <xsl:value-of select="concat(' ',name(),'=&quot;',.,'&quot;')"/>
  </xsl:template>

  <xsl:template match="node()" mode="output">
    <!-- for xsl:copy-of behavior of xsltproc;
         Saxon, xalan and DataPower XSLT processors do not do this.
    <xsl:if test="(.=/) and
                  (preceding::comment()|preceding::processing-instruction
())">
      <xsl:text>&#10;</xsl:text>
    </xsl:if>
    -->

    <xsl:value-of select="concat('&lt;',name())"/>

    <xsl:apply-templates select="@*" mode="output"/>

    <xsl:for-each select="namespace::*">
      <xsl:if test="not(.=../../namespace::*) and name()!='xml'">
        <xsl:value-of select= "concat(' xmlns',
                                       substring(':',1 div boolean(name
())),
                                       name(),'=&quot;',.,'&quot;')" />
      </xsl:if>
    </xsl:for-each>

    <xsl:choose>
      <xsl:when test="*|text()|comment()|processing-instruction()">
        <xsl:text>></xsl:text>

        <xsl:apply-templates
          select="*|text()|comment()|processing-instruction()"
mode="output"/>

        <xsl:value-of select="concat('&lt;/',name(),'>')"/>
      </xsl:when>

      <xsl:otherwise>
        <xsl:text>/></xsl:text>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <xsl:template match="comment()" mode="output">
    <xsl:value-of select="concat('&lt;!--',.,'-->')"/>
  </xsl:template>

  <xsl:template match="processing-instruction()" mode="output">
    <xsl:value-of select="concat('&lt;?',name(),' ',.,'?>')"/>
  </xsl:template>

  <xsl:template match="text()" mode="output">
    <!--
         overhead: 1 x call-template + 3 x contains() for text without
CDATA
    -->
    <xsl:call-template name="escapeLtGtAmp">
      <xsl:with-param name="str" select="."/>
    </xsl:call-template>
  </xsl:template>


  <xsl:template name="escapeLtGtAmp">
    <xsl:param name="str"/>

    <xsl:choose>
      <xsl:when test="contains($str,'&lt;') or
                      contains($str,'&gt;') or
                      contains($str,'&amp;')">
        <xsl:variable name="lt"
          select="substring-before(concat($str,'&lt;'),'&lt;')"/>
        <xsl:variable name="gt"
          select="substring-before(concat($str,'&gt;'),'&gt;')"/>
        <xsl:variable name="amp"
          select="substring-before(concat($str,'&amp;'),'&amp;')"/>

        <xsl:choose>
          <xsl:when test="string-length($gt) > string-length($amp)">
            <xsl:choose>
              <xsl:when test="string-length($amp) > string-length($lt)">
                <xsl:value-of
                  select="concat(substring-before
($str,'&lt;'),'&amp;lt;')"/>

                <xsl:call-template name="escapeLtGtAmp">
                  <xsl:with-param name="str"
                    select="substring-after($str,'&lt;')"/>
                </xsl:call-template>
              </xsl:when>

              <xsl:otherwise>
                <xsl:value-of
                  select="concat(substring-before
($str,'&amp;'),'&amp;amp;')"/>

                <xsl:call-template name="escapeLtGtAmp">
                  <xsl:with-param name="str"
                    select="substring-after($str,'&amp;')"/>
                </xsl:call-template>
              </xsl:otherwise>
            </xsl:choose>
          </xsl:when>

          <xsl:otherwise>
            <xsl:choose>
              <xsl:when test="string-length($gt) > string-length($lt)">
                <xsl:value-of
                  select="concat(substring-before
($str,'&lt;'),'&amp;lt;')"/>

                <xsl:call-template name="escapeLtGtAmp">
                  <xsl:with-param name="str"
                    select="substring-after($str,'&lt;')"/>
                </xsl:call-template>
              </xsl:when>

              <xsl:otherwise>
                <xsl:value-of
                  select="concat(substring-before
($str,'&gt;'),'&amp;gt;')"/>

                <xsl:call-template name="escapeLtGtAmp">
                  <xsl:with-param name="str"
                    select="substring-after($str,'&gt;')"/>
                </xsl:call-template>
              </xsl:otherwise>
            </xsl:choose>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:when>

      <xsl:otherwise>
        <xsl:value-of select="$str"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>


Mit besten Gruessen / Best wishes,

Hermann Stamm-Wilbrandt
Developer, XML Compiler, L3
WebSphere DataPower SOA Appliances
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

Current Thread