Re: [xsl] String contains a regex and then junk ... how to remove the junk?

Subject: Re: [xsl] String contains a regex and then junk ... how to remove the junk?
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 16 Dec 2024 13:41:25 -0000
A good case for Invisible XML, though sadly we don't have it integrated into
Saxon yet.

The first step here is finding a matching closing paren. The second step is
dealing with backslash-escaped parens.

For the first step, I would use xsl:iterate iterating over the characters of
the string (in 4.0 use the fn:characters function, in 3.0 use
string-to-codepoints). Maintain a variable $depth over the iteration,
increment it on a left paren, decrement it on a right paren, break the
iteration when the depth reaches zero.

Then handling backslashes is just an extra bit of logic: in your xsl:iterate,
define a second variable that indicates whether the immediately preceding
character is a backslash (or rather, an unescaped backslash) and avoid
recognizing parens if it is.

Michael Kay
Saxonica



> On 16 Dec 2024, at 13:24, Roger L Costello costello@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hi Folks,
>
> I want to convert this:
>
> <REG_EXP>(([\W\w]{1,80})?) &lt;INFO&gt;</REG_EXP>
>
> to this:
>
> <REG_EXP>(([\W\w]{1,80})?)</REG_EXP>
>
> Convert this:
>
> <REG_EXP>([A-Z]{2}[0-9A-Z ]{0,13}) &lt;ARF ID&gt;</REG_EXP>
>
> to this:
>
> <REG_EXP>([A-Z]{2}[0-9A-Z ]{0,13})</REG_EXP>
>
> I want to remove the junk that follows the regex.
>
> I wrote a recursive function to do this. See below. Is there is a simpler
way to do it?
>
> -------------------------------------
> <xsl:function name="f:get-regex">
>    <xsl:param name="string"/>
>    <xsl:choose>
>        <xsl:when test="substring($string,1,1) ne '('">
>            <xsl:message>Error! Expecting the regex to start with left
paren</xsl:message>
>        </xsl:when>
>        <xsl:otherwise>
>            <xsl:value-of
select="concat('(',f:get-regex-helper($string,2,1))"/>
>        </xsl:otherwise>
>    </xsl:choose>
> </xsl:function>
>
> <xsl:function name="f:get-regex-helper">
>    <xsl:param name="string"/>
>    <xsl:param name="index"/>
>    <xsl:param name="count-left-parens-to-match"/>
>    <xsl:choose>
>        <xsl:when test="$count-left-parens-to-match eq 0">
>            <xsl:value-of select="substring($string,1,$index - 1)"/>
>        </xsl:when>
>        <xsl:when test="substring($string,$index,1) eq ')'">
>            <xsl:value-of
select="f:get-regex-helper($string,$index+1,$count-left-parens-to-match -
1)"/>
>        </xsl:when>
>        <xsl:otherwise>
>            <xsl:value-of
select="f:get-regex-helper($string,$index+1,$count-left-parens-to-match)"/>
>        </xsl:otherwise>
>    </xsl:choose>
> </xsl:function>

Current Thread