[xsl] Removing unwanted space

Subject: [xsl] Removing unwanted space
From: "Charles O'Connor coconnor@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 3 Jun 2021 23:54:18 -0000
OK, I've tried this a bunch of ways and failed (using XSLT 2.0).

The XML I'm working with has a bunch of unwanted whitespace in all sorts of
places, but looking just at paragraphs, it can have

<p>
	The rain in <bold>Spain</bold> <italic>is</italic> wet.
</p>

Or

<p>
	<bold>The rain in Spain is wet.</bold>
</p>

What I and any semi-sane person wants is (TBH, it's the online XML editor that
wants it):

<p>The rain in <bold>Spain</bold> <italic>is</italic> wet.</p>

Or

<p><bold>The rain in Spain is wet.</bold></p>

In some places the XML actually starts this way, but it's not consistent at
all.

One track I went down dead-ended at regular expressions not being able to be
constructed in a way that could return an empty string. Me, I'd have been fine
with the occasional empty string, because it would have been an empty string
of things I did not want, if that makes any sense (and it does not).

Anyway, my attempt to get around that was to look at the first text node and
see if it started with spaces and if so to get rid of them:

    <xsl:template match="p/text()[1]">
        <xsl:choose>
            <xsl:when test="matches(.,'^\s+.*')">
                 <xsl:analyze-string select="." regex="^\s+(\S?.*)">
                    <xsl:matching-substring>
                        <xsl:value-of select="regex-group(1)"/>
                    </xsl:matching-substring>
                </xsl:analyze-string>
            </xsl:when>
            <xsl:otherwise>
                <xsl:apply-templates/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

And sure, I know that the first text node might in reality come after some
content in a child of <p>, but I was willing to cross that bridge when I
actually mangled some content. But for this template, I got a warning: "The
child axis starting at a text node node will never select anything", which is
rather dreary.

Anyway, I'm a little loopy with banging my head against this, but one way or
another, I'm missing this. I'm only treating the text node as a string, not as
a node with children, but apparently I only think that and I am wrong, because
the machine is smarter than I am.

Any help for how to get rid of the space at the beginning and end of
paragraphs without getting rid of the space between elements within the
paragraph would be appreciated.

Thanks!
Charles


Charles O'Connor l Business Systems Analyst
Pronouns: He/Him
Aries Systems Corporation l www.ariessys.com
50 High Street, Suite 21 l North Andover, MA l 01845 l USA  


Main: +1 (978) 975-7570
Cell: +1 (802) 585-5655

       

Current Thread