Re: [xsl] Correcting misplaced spaces in XML documents

Subject: Re: [xsl] Correcting misplaced spaces in XML documents
From: "Trevor Nicholls trevor@xxxxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 26 Mar 2023 13:39:01 -0000
Thank you Gerrit, that looks like a very useful project which I will have a
close look at.
I would not have thought of the complication with footnotes without your
comments, but that's something I could well encounter in our documents.

Thanks to others who made suggestions too.

(Syd) I can't be completely generic because there are elements where leading
spaces really are significant (e.g. code snippets). But I'll look at your
methods as well.

cheers
T

-----Original Message-----
From: Imsieke, Gerrit, le-tex gerrit.imsieke@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Sunday, 26 March 2023 23:21
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [xsl] Correcting misplaced spaces in XML documents

Hi Trevor,

emphasis-normalize-space [1] can deal with whitespace within nested elements
and with embedded footnotes whose accidental leading or trailing whitespace
shouldn't be pulled out and put into the surrounding paragraph.

Gerrit

[1] https://github.com/gimsieke/emphasis-normalize-space

On 26.03.2023 03:33, Trevor Nicholls trevor@xxxxxxxxxxxxxxxxxx wrote:
> I suppose this falls into the category of data cleanup.
>
> In the very simple case I am importing documents which have content
> like
> this:
>
>      <para>Press the<keyname> Escape </keyname>key.</para>
>
> You'll notice that the adjacent spaces are wrapped in the keyname
> element when they should just be adjacent to it, not in it.
>
> This is a pathological case, usually the keyname is correct, but
> occasionally there is a leading or a trailing space, hardly ever both.
>
> I've written a simple stylesheet which corrects this situation,
> identifying leading and trailing whitespace, and outputting the
> appropriate breakdown:
>
>    <xsl:template match="keyname">
>
>      <xsl:variable name="leading">b&</xsl:variable>
>
>      <xsl:variable name="trailing">b&</xsl:variable>
>
>      <xsl:variable name="content">b&</xsl:variable>
>
>      <xsl:if test="$leading" != ''><xsl:value-of
> select="$leading"/></xsl:if>
>
>      <xsl:element name="keyname">
>
>        <xsl:apply-templates select="@*"/>
>
>        <xsl:value-of select="$content" />
>
>     </xsl:element>
>
>      <xsl:if test="$trailing" != ''><xsl:value-of
> select="$trailing"/></xsl:if>
>
>    </xsl:template>
>
> This is all fine, and it's adequate for the job when the "greedy"
> elements only contain text, which is the case for keynames.
>
> However now I want to extend the stylesheet to correct some other
> cases where the content model of the element is not just simple text.
>
> For example:
>
>    <para>Select the<filename> <var>username</var>.profile
> </filename>file.</para>
>
> Although the cases I am looking at right now only have a content model
> of text or <var> elements, a more general solution would be welcome
> because other cases are going to turn up where elements are nested two
> or three levels deep.
>
> I've got myself neck deep into conditionals trying to extend my simple
> template to cope with this, and I'm sure there's a straightforward way
> of doing it that doesn't need several hundred lines of code.
>
> Can anyone point me to a cleaner way of doing it?
>
> cheers
>
> T

Current Thread