Re: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)

Subject: Re: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Thu, 10 Jan 2002 13:05:07 +0000
Hi Marc,

> some
> <regex name="fancy-number">[0-9]+(\.[0-9]+)?([Ee][+-][0-9]+)?</regex>
> could then later be used inside
> <matcher name="" regex="(other groups):fancy-number:(other groups)">
> ... while nested matchers or output-selecting elements could then use group
> selections like
> 1.      <...    select-group="1"> ... or 2 refering to counting the parenthesis in
> the scoped regex of this matcher
> 2.      <... select-group=":fancy-number:2" >
> </matcher>
> could be challenging to implement (spontanous idea of using the
> indexes as offsets in counting parenthesis)

I like this method better than the Omnimark method of assigning the
names within the regular expression itself, because it doesn't clutter
the regular expression (if anything it makes it more readable) and it
allows regular expressions to be reused.

There are a couple of issues that would need to be worked out with it,
though. What happens if you have a regular expression that involved
two instances of the named subexpression at the same level:

  <matcher name="two-numbers" regexp=":fancy-number:\w:fancy-number:">

You need to have separate indexes to indicate which one you're talking
about, plus some kind of syntax to pull out submatches within the
named subexpression. Borrowing from XPath syntax (which might be a bad
idea), you might have:


to indicate the second subexpression of the second fancy-number
subexpression in the matched string.

Actually, that syntax isn't all that bad - you can imagine the matcher
actually builds up a tree structure based on the subexpression
matches (you need 'anonymous' elements for unnamed subexpressions, but
you should be able to get away with that using elements in some
restricted namespace or something)...

> this also makes me think about your earlier mentioning of dynamic
> regexes you probably expect anything that qualifies as a
> text-representing xsl parameter to be possibly carrying part of the
> regex to be executed...

I think that if you could build the named regular expressions
dynamically, then this idea would work fine. Going back to the keyword
example that I used on an earlier mail, you could do:

<xsl:regexp name="keyword-as-word"
            select="concat('\W', $keyword, '\W')" />

If named regular expressions were like variables, you could assign
them values at the global or local level...



Jeni Tennison

 XSL-List info and archive:

Current Thread