Re: [xsl] recursive replacing strings with nodes

Subject: Re: [xsl] recursive replacing strings with nodes
From: James Cummings <james+xsl@xxxxxxxxxxxxxxxxx>
Date: Fri, 19 Feb 2010 14:24:59 +0000
On Fri, Feb 19, 2010 at 11:57, Martin Honnen <Martin.Honnen@xxxxxx> wrote:
> Here is a stylesheet trying to solve that

Wow, that seems to do what I want!  See comments inline where I try to
understand what is going on (so that when I google for this in a
couple years I can see what I thought was happening!).   Thanks
Martin.  (Someone off-list sent me a perl script that might accomplish
the same thing... but I'd prefer to do it in XSLT if possible ;-) )

For posterity:

> <xsl:stylesheet
> B xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
> B version="2.0"
> B xmlns:xsd="http://www.w3.org/2001/XMLSchema";
> B xmlns:mf="http://example.com/2010/mf";
> B xmlns:functx="http://www.functx.com";
> B exclude-result-prefixes="xsd mf functx">
>
> B <xsl:function name="functx:escape-for-regex" as="xsd:string"
> B  B  B  B  B  B  B  B  >
> B  B <xsl:param name="arg" as="xsd:string?"/>
>
> B  B <xsl:sequence select="
> B  B  replace($arg,
> B  B  B  B  B  B  '(\.|\[|\]|\\|\||\-|\^|\$|\?|\*|\+|\{|\}|\(|\))','\\$1')
> B  "/>
> B </xsl:function>

Include  the functx:escape-for-regex to escape the strings because I
intentionally made sure some of the strings in my sample had
regex-nasty characters like + and such (because my real input does).

> B <xsl:param name="abbr-url" as="xsd:string"
select="'test2010021902.xml'"/>
> B <xsl:variable name="abbr" as="element(abbr)*"
> select="doc($abbr-url)/root/choice/abbr"/>

Load nodeset of abbr elements as a variable from the lookuptable file
storing them as elements

Define the mf:replace function:
> B <xsl:function name="mf:replace" as="node()*">
> B  B <xsl:param name="str" as="xsd:string"/>
> B  B <xsl:param name="abbr" as="element(abbr)*"/>
which has two parameters a string and an abbr element

> B  B <xsl:choose>
> B  B  B <xsl:when test="$abbr">
> B  B  B  B <xsl:analyze-string select="$str"
> regex="{functx:escape-for-regex($abbr[1])}">

analyze string provided looking for the first abbr (escaped for any regex)

> B  B  B  B  B <xsl:matching-substring>
> B  B  B  B  B  B <xsl:copy-of select="$abbr[1]/../expan/w"/>
> B  B  B  B  B </xsl:matching-substring>

when it matches, go up to parent and copy-of the content of expan/w
> B  B  B  B  B <xsl:non-matching-substring>
> B  B  B  B  B  B <xsl:sequence select="mf:replace(., $abbr[position() gt
1])"/>
> B  B  B  B  B </xsl:non-matching-substring>

when it doesn't match take the next thing in the implicit sequence
inside the abbr recursively calling mf:replace()

> B  B  B  B </xsl:analyze-string>
> B  B  B </xsl:when>
> B  B  B <xsl:otherwise>
> B  B  B  B <xsl:value-of select="$str"/>
> B  B  B </xsl:otherwise>

If there isn't $abbr then put our the string.
> B  B </xsl:choose>
> B </xsl:function>
>

standard copy-all template:
> B <xsl:template match="@* | node()">
> B  B <xsl:copy>
> B  B  B <xsl:apply-templates select="@*, node()"/>
> B  B </xsl:copy>
> B </xsl:template>
>

Everytime you come across a seg kick this off by copying it and for
its contents making a sequence of mf:replace()
> B <xsl:template match="seg">
> B  B <xsl:copy>
> B  B  B <xsl:sequence select="mf:replace(., $abbr)"/>
> B  B </xsl:copy>
> B </xsl:template>
>
> </xsl:stylesheet>
>


Thanks Martin!

-James

Current Thread