Re: [xsl] Removing unwanted space

Subject: Re: [xsl] Removing unwanted space
From: "Wendell Piez wapiez@xxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 4 Jun 2021 11:36:25 -0000
Hey Charles,

A couple of techniques I use in this situation:

text()[. is ancestor::p/descendant::text()[1]] -  matches the first text
node in a p, no matter how deep.
text()[. is ancestor::p/descendant::text()[last()]] - same for the end

text()[not(matches(.,'\S')] - text that has no non-whitespace character

replace($str,'^\s*','') - strip *leading whitespace only* from a string.
replace($str,'\s*$','') - same for trailing whitespace

Et sim.

I am not sure I would use xsl:analyze-string here since as you observe it
can be (um) pesky. I might do something as simple as

<xsl:template match=" text()[. is ancestor::p/descendant::text()[1]]">
  <xsl:value-of select=" replace($str,'^\s*','') "/>
</xsl:template>

But the match might have to be greedier if the inline markup is also
deep, and this is only the front end.

This is not an easy problem since the (very smart) computer doesn't know
the difference between "white space that matters" and "white space that
doesn't matter". Indeed its whole notion of "white space" is somewhat
problematic. So I'm not sure who's actually smarter. :-)

Cheers, Wendell





On Thu, Jun 3, 2021 at 7:54 PM Charles O'Connor coconnor@xxxxxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> OK, I've tried this a bunch of ways and failed (using XSLT 2.0).
>
> The XML I'm working with has a bunch of unwanted whitespace in all sorts
> of places, but looking just at paragraphs, it can have
>
> <p>
>         The rain in <bold>Spain</bold> <italic>is</italic> wet.
> </p>
>
> Or
>
> <p>
>         <bold>The rain in Spain is wet.</bold>
> </p>
>
> What I and any semi-sane person wants is (TBH, it's the online XML editor
> that wants it):
>
> <p>The rain in <bold>Spain</bold> <italic>is</italic> wet.</p>
>
> Or
>
> <p><bold>The rain in Spain is wet.</bold></p>
>
> In some places the XML actually starts this way, but it's not consistent
> at all.
>
> One track I went down dead-ended at regular expressions not being able to
> be constructed in a way that could return an empty string. Me, I'd have
> been fine with the occasional empty string, because it would have been an
> empty string of things I did not want, if that makes any sense (and it does
> not).
>
> Anyway, my attempt to get around that was to look at the first text node
> and see if it started with spaces and if so to get rid of them:
>
>     <xsl:template match="p/text()[1]">
>         <xsl:choose>
>             <xsl:when test="matches(.,'^\s+.*')">
>                  <xsl:analyze-string select="." regex="^\s+(\S?.*)">
>                     <xsl:matching-substring>
>                         <xsl:value-of select="regex-group(1)"/>
>                     </xsl:matching-substring>
>                 </xsl:analyze-string>
>             </xsl:when>
>             <xsl:otherwise>
>                 <xsl:apply-templates/>
>             </xsl:otherwise>
>         </xsl:choose>
>     </xsl:template>
>
> And sure, I know that the first text node might in reality come after some
> content in a child of <p>, but I was willing to cross that bridge when I
> actually mangled some content. But for this template, I got a warning: "The
> child axis starting at a text node node will never select anything", which
> is rather dreary.
>
> Anyway, I'm a little loopy with banging my head against this, but one way
> or another, I'm missing this. I'm only treating the text node as a string,
> not as a node with children, but apparently I only think that and I am
> wrong, because the machine is smarter than I am.
>
> Any help for how to get rid of the space at the beginning and end of
> paragraphs without getting rid of the space between elements within the
> paragraph would be appreciated.
>
> Thanks!
> Charles
>
>
> Charles O'Connor l Business Systems Analyst
> Pronouns: He/Him
> Aries Systems Corporation l www.ariessys.com
> 50 High Street, Suite 21 l North Andover, MA l 01845 l USA
>
>
> Main: +1 (978) 975-7570
> Cell: +1 (802) 585-5655
>
>
> 
>
>

-- 
...Wendell Piez... ...wendell -at- nist -dot- gov...
...wendellpiez.com... ...pellucidliterature.org... ...pausepress.org...
...github.com/wendellpiez... ...gitlab.coko.foundation/wendell...

Current Thread