[xsl] Correcting misplaced spaces in XML documents

Subject: [xsl] Correcting misplaced spaces in XML documents
From: "Trevor Nicholls trevor@xxxxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 26 Mar 2023 01:33:51 -0000
I suppose this falls into the category of data cleanup.

 

In the very simple case I am importing documents which have content like
this:

 

    <para>Press the<keyname> Escape </keyname>key.</para>

 

You'll notice that the adjacent spaces are wrapped in the keyname element
when they should just be adjacent to it, not in it.

This is a pathological case, usually the keyname is correct, but
occasionally there is a leading or a trailing space, hardly ever both.

 

I've written a simple stylesheet which corrects this situation, identifying
leading and trailing whitespace, and outputting the appropriate breakdown:

 

  <xsl:template match="keyname">

    <xsl:variable name="leading">.</xsl:variable>

    <xsl:variable name="trailing">.</xsl:variable>

    <xsl:variable name="content">.</xsl:variable>

    <xsl:if test="$leading" != ''><xsl:value-of select="$leading"/></xsl:if>

    <xsl:element name="keyname">

      <xsl:apply-templates select="@*"/>

      <xsl:value-of select="$content" />

   </xsl:element>

    <xsl:if test="$trailing" != ''><xsl:value-of
select="$trailing"/></xsl:if>

  </xsl:template>

 

This is all fine, and it's adequate for the job when the "greedy" elements
only contain text, which is the case for keynames.

 

However now I want to extend the stylesheet to correct some other cases
where the content model of the element is not just simple text.

For example:

 

  <para>Select the<filename> <var>username</var>.profile
</filename>file.</para>

 

Although the cases I am looking at right now only have a content model of
text or <var> elements, a more general solution would be welcome because
other cases are going to turn up where elements are nested two or three
levels deep.

 

I've got myself neck deep into conditionals trying to extend my simple
template to cope with this, and I'm sure there's a straightforward way of
doing it that doesn't need several hundred lines of code.

 

Can anyone point me to a cleaner way of doing it?

 

cheers

T

Current Thread