Re: [xsl] Removing unwanted space

Subject: Re: [xsl] Removing unwanted space
From: "Graydon graydon@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 4 Jun 2021 00:36:35 -0000
On Thu, Jun 03, 2021 at 11:54:25PM -0000, Charles O'Connor
coconnor@xxxxxxxxxxxx scripsit:
> OK, I've tried this a bunch of ways and failed (using XSLT 2.0).
> 
> The XML I'm working with has a bunch of unwanted whitespace in all
> sorts of places, but looking just at paragraphs, it can have

What you want is, in principle,

<xsl:template select="text()[not(normalize-space())]"/>

which will delete all the white-space only text nodes, and

<xsl:template select="text()[normalize-space()]">
    <xsl:value-of select="normalize-space(.)"/>
</xsl:template>

which will get rid of leading or trailing white space in NON
white-space-only text nodes. (that is, a text node that has words in
it) as well as converting any amount of whitespace into a single
space character.

This has at least a couple drawbacks; drawback the first, the "I have
one space between inline markup in mixed content" case, which this will
delete.  Drawback the second, you might not want to get rid of leading
or trailing white space in a text node because you're crunching it up
against some inline markup -- "words about <b>thingy</b>" should not
become "words about<b>thingy</b>" but if you use the naive version of
this it will.

The slightly less naive approach could be:

<xd:doc>
  <xd:desc>murder the line breaks and clumps of spaces</xd:desc>
</xd:doc>
<xsl:template match="text()[normalize-space()]" >
  <xsl:analyze-string regex="[&#x000A;&#x0020;&#x0009;&#x000D;]+" select=".">
    <xsl:matching-substring>
      <xsl:text>&#x0020;</xsl:text>
    </xsl:matching-substring>
    <xsl:non-matching-substring>
      <xsl:value-of select="." />
    </xsl:non-matching-substring>
  </xsl:analyze-string>
</xsl:template>

<xd:doc>
  <xd:desc>disdain the interstitial (slightly carefully)</xd:desc>
</xd:doc>
<xsl:template match="text()[not(normalize-space())][not(matches(.,'^&#x0020;$'))]">
  <!-- thud -->
  <!-- not actual words, not a single space -->
</xsl:template>

this will leave the leading and trailing spaces, clump any amount of
whitespace into single spaces, and leave the single space (but not, as
written, _two_ spaces) between inline markup elements.


-- 
Graydon Saunders  | graydonish@xxxxxxxxx
^fs oferiode, pisses swa mfg.
-- Deor  ("That passed, so may this.")

Current Thread