Re: [xsl] Correcting misplaced spaces in XML documents

Subject: Re: [xsl] Correcting misplaced spaces in XML documents
From: "Eliot Kimber eliot.kimber@xxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 27 Mar 2023 19:59:15 -0000
Not necessarily a helpful response, but Im finding that using XQuery update
with i.e., OxygenXML refactors, to do this kind of markup cleanup is easier
and more reliable than the equivalent XSLT, especially if its a one-off data
cleanup, rather than something you need to do repeatedly.

This is because the XQuery update is an in-place change that only changes the
precise thing you selected and not anything else, which is always a danger
with XSLT, no matter how simple. That makes testing easieronce youve proven
your target node selector and verified your update expression theres really
no way for the update to go wrong.

Unfortunately, XQuery update with Saxon requires in EE license, which means
you either run it from Oxygen or acquire a license (definitely worth the cost
if you need it and can otherwise justify the cost).

You can also use XQuery databases like BaseX to do the updating but that takes
a bit more work to set up the writing of the result back out and the database
may not always preserve all the document details the way Oxygen or Saxon
will.

Cheers,

E.

_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com<https://www.servicenow.com>
LinkedIn<https://www.linkedin.com/company/servicenow> |
Twitter<https://twitter.com/servicenow> |
YouTube<https://www.youtube.com/user/servicenowinc> |
Facebook<https://www.facebook.com/servicenow>

From: Trevor Nicholls trevor@xxxxxxxxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Saturday, March 25, 2023 at 8:40 PM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Subject: [xsl] Correcting misplaced spaces in XML documents
[External Email]

________________________________
I suppose this falls into the category of data cleanup.

In the very simple case I am importing documents which have content like
this:

    <para>Press the<keyname> Escape </keyname>key.</para>

You'll notice that the adjacent spaces are wrapped in the keyname element when
they should just be adjacent to it, not in it.
This is a pathological case, usually the keyname is correct, but occasionally
there is a leading or a trailing space, hardly ever both.

I've written a simple stylesheet which corrects this situation, identifying
leading and trailing whitespace, and outputting the appropriate breakdown:

  <xsl:template match="keyname">
    <xsl:variable name="leading"></xsl:variable>
    <xsl:variable name="trailing"></xsl:variable>
    <xsl:variable name="content"></xsl:variable>
    <xsl:if test="$leading" != ''><xsl:value-of select="$leading"/></xsl:if>
    <xsl:element name="keyname">
      <xsl:apply-templates select="@*"/>
      <xsl:value-of select="$content" />
   </xsl:element>
    <xsl:if test="$trailing" != ''><xsl:value-of
select="$trailing"/></xsl:if>
  </xsl:template>

This is all fine, and it's adequate for the job when the "greedy" elements
only contain text, which is the case for keynames.

However now I want to extend the stylesheet to correct some other cases where
the content model of the element is not just simple text.
For example:

  <para>Select the<filename> <var>username</var>.profile
</filename>file.</para>

Although the cases I am looking at right now only have a content model of text
or <var> elements, a more general solution would be welcome because other
cases are going to turn up where elements are nested two or three levels
deep.

I've got myself neck deep into conditionals trying to extend my simple
template to cope with this, and I'm sure there's a straightforward way of
doing it that doesn't need several hundred lines of code.

Can anyone point me to a cleaner way of doing it?

cheers
T

XSL-List info and archive<http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe<http://lists.mulberrytech.com/unsub/xsl-list/3453418> (by
email<>)

Current Thread