Re: [xsl] Find/replace algorithm

Subject: Re: [xsl] Find/replace algorithm
From: "Joel Kalvesmaki director@xxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 28 Mar 2021 09:26:08 -0000
Hi Jim,

If you can express the substitutions as proper XSLT regular expressions (i.e., no \b), then you could add as one more possibility the use of tan:batch-replace(), which allows you to stay within XSLT.

One passes as parameters a sequence of elements with attributes corresponding to fn:match() parameters. They are processed in sequence order. An extra @message attribute can provide simple feedback as changes are made. Your example:

   <xsl:variable name="my-batch-replaces" as="element()*">
      <replace pattern="(\W)Wid(\W)" replacement="$1Widget$2"/>
      <replace pattern="(\W)Assbly(\W)" replacement="$1Assembly$2"/>
      <replace pattern="(\W)Eng(\W)" replacement="$1Engine$2"/>
   </xsl:variable>

   <xsl:template match="text()" mode="batch-replace">
      <xsl:value-of select="tan:batch-replace(., $my-batch-replaces)"/>
   </xsl:template>

Over the last several years the function has personally proved in day-to-day use on XML files of punishing complexity to be quite efficient, even when I ask for numerous replacements with some crazy regular expressions.

The code (don't forget the submodules):
https://github.com/textalign/TAN-2020/blob/b71f9f93030232911a83508e88f0d184d88fbe00/functions/incl/TAN-core-string-functions.xsl

One more arrow in your quiver....

jk
--
Joel Kalvesmaki
Director, Text Alignment Network
http://textalign.net

I use this program along with XSLT.  I choose the best from each
world.  Right now I am converting a dictionary in Word format to be
input into FLEx (our linguistic tool) for handling dictionaries. I
export Word to XHTML and then process with the context in mind with
XSLT.  After I have done as much as I can here I switch to CC. There
are lots of items that are better handled with the changes tables.

Jim Albright
Wycliffe Bible Translators
704-562-1529

On Thu, Mar 25, 2021 at 1:35 PM Liam R. E. Quin liam@xxxxxxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

On Thu, 2021-03-25 at 16:29 +0000, rick@xxxxxxxxxxxxxx wrote:
Thank you Michael. I like the idea of keeping the processing cost
constant
but I was going to use regular expressions in my map, so I may
still
have to
loop through the lookup structure.

An alternative to consider is to put your input document into a map and check each map key against all the tokens.

Current Thread