Re: [xsl] String contains a regex and then junk ... how to remove the junk?

Subject: Re: [xsl] String contains a regex and then junk ... how to remove the junk?
From: "David Carlisle d.p.carlisle@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 16 Dec 2024 14:08:51 -0000
Isit really "junk" which seems hard to define (as more or less any list of
characters is a legal fragment of regex, matching itself.) Is

[a-z]+JUNK  the regex [a-z]+ followed by JUNK or the regex [a-z]+JUNK  ?

Or do you just want to strip a trailing <KEYWORDS HERE>  which is easier
but I wouldn't describe it as JUNK if it's matching a specific angle
bracket syntax.



On Mon, 16 Dec 2024 at 13:43, Norm Tovey-Walsh ndw@xxxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> > I wrote a recursive function to do this. See below. Is there is a
> simpler way to do it?
>
> If the regex is always in parens, and if the junk that follows never
> contains a b)b, then just look for the last b)b.
>
> If the regex is always in parens, but the junk might include b(b and/or
> b)b then itbs going to be harder.
>
> If the regex isnbt always in parens, b& Ibm not sure the problem is
> tractable. A string of the form babcdb could be interpreted several
ways
> depending on whether bbcdb, bcdb, bdb, or bb is considered
junk.
>
> On a quick skim, I wasnbt able to persuade myself that your recursive
> solution was handling escaped parens, if thatbs an issue
>
> Assuming the regex is always in parens, I cooked up this ixml grammar in a
> moment or two, but it doesnbt handle escaped parens either.
>
> text = regex, junk? .
> regex = '(', inner*, ')' .
> -inner = -regex | ~["()"] .
> junk = ~[]* .
>
>                                         Be seeing you,
>                                           norm
>
> --
> Norm Tovey-Walsh <ndw@xxxxxxxxxx>
> https://norm.tovey-walsh.com/
>
> > Weeks of programming can save you hours of planning.

Current Thread