[xsl] removing final space from node tree

Subject: [xsl] removing final space from node tree
From: Lars Huttar <lars_huttar@xxxxxxx>
Date: Mon, 20 Apr 2009 15:09:31 -0500
Hello,

I have the following real-world scenario.

Our stylesheet inputs some data-oriented XML (a series of
<language-entry> elements and their descendants), and outputs
documentish XML (a series of <language-desc> elements with descendants),
basically structuring the data into a more human-oriented stream prior
to typesetting.

The input conforms to a DTD, but many pieces are optional, so the
structure is not very predictable. For this reason, the transformation
outputs a space at the end of processing any input (sub-)element, to
separate the text output for that element from following text output, if
any.

Unfortunately, this means that the text content of each output
<lanuguage-desc> element often ends with an unwanted space. Worse, this
extra space causes confusion downstream. So the question is, how to get
rid of this extra space (if any)?

I've coded a solution, and it works, but it's kind of ugly. It involves
post-processing the result of the above transformation (thank you Saxon,
we are no longer stuck with immutable RTFs!). Basically we look for the
final text node of the output tree, and if it ends with a space, we
apply a recursive template in mode "strip-final-space" passing the last
text node as a parameter.
The recursive template performs an identity transformation, except for
that last text node, which it strips of its final space. Here is the
relevant code:

  <!-- We need to remove any extraneous final spaces. -->
  <!-- set context node to language-desc -->
  <xsl:for-each select="$language-desc">
    <!-- find last text node in descendants -->
    <xsl:variable name="last-text-node" select="(.//text())[last()]"/>
      <xsl:choose>
        <xsl:when test="ends-with($last-text-node, ' ')">
          <xsl:apply-templates mode="strip-final-space" select=".">
            <xsl:with-param name="last-text-node"
                select="$last-text-node"/>
            </xsl:apply-templates>
        </xsl:when>
        <xsl:otherwise><xsl:sequence select="."/></xsl:otherwise>
      </xsl:choose>
    </xsl:for-each>

...

    <!-- copy the context node tree, removing final space from the node
that is passed as a parameter. -->
    <xsl:template mode="strip-final-space" match="@*|node()">
        <xsl:param name="last-text-node" />
        <xsl:choose>
            <xsl:when test=". is $last-text-node"><xsl:value-of
select="substring(., 1, string-length(.) - 1)"/></xsl:when>
            <xsl:otherwise>
                <xsl:copy>
                    <xsl:apply-templates mode="strip-final-space"
select="@*|node()">
                        <xsl:with-param name="last-text-node"
select="$last-text-node"/>
                    </xsl:apply-templates>
                </xsl:copy>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>


My concern is that this seems like a very complex way to have to
eliminate a final space. I'm also a bit concerned about performance. I
haven't tested that yet, but we have been running up against CPU and
memory bottlenecks, and I hate to add another level of processing to the
output of a transformation. Can anyone suggest a simpler way to
eliminate the final space -- either in the initial transformation, or in
the postprocessing? Is there a simpler way to remove a final space from
a node tree?

One idea would be, during the initial transformation, to output a
following space only *if* there are certain following data elements in
the input. But it seems to me that would be some fairly complex code in
itself, sorting out what conditionals are needed at each point; and
difficult to maintain, as the input structure changed. But maybe I'm
just lazy about apparently tedious tasks.

Any suggestions would be appreciated.

Lars

Current Thread