RE: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)

Subject: RE: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)
From: "Marc Portier" <mpo@xxxxxxxxxxxxxxxx>
Date: Fri, 11 Jan 2002 00:49:00 +0100
Hi Jeni,

> -----Original Message-----
> From: Jeni Tennison [mailto:jeni@xxxxxxxxxxxxxxxx]
> Sent: donderdag 10 januari 2002 14:05
> To: Marc Portier
> Cc: Steven Noels; xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: Regular expression functions (Was: Re: [xsl] comments on
> December F&O draft)
>
>
> Hi Marc,
>
> > some
> > <regex name="fancy-number">[0-9]+(\.[0-9]+)?([Ee][+-][0-9]+)?</regex>
> >
> > could then later be used inside
> > <matcher name="" regex="(other groups):fancy-number:(other groups)">
> > ... while nested matchers or output-selecting elements could
> then use group
> > selections like
> > 1.      <...    select-group="1"> ... or 2 refering to counting
> the parenthesis in
> > the scoped regex of this matcher
> > 2.      <... select-group=":fancy-number:2" >
> > </matcher>
> >
> > could be challenging to implement (spontanous idea of using the
> > indexes as offsets in counting parenthesis)
>
> I like this method better than the Omnimark method of assigning the
> names within the regular expression itself, because it doesn't clutter
> the regular expression (if anything it makes it more readable) and it
> allows regular expressions to be reused.
>
jep

> There are a couple of issues that would need to be worked out with it,
> though. What happens if you have a regular expression that involved
> two instances of the named subexpression at the same level:
>
>   <matcher name="two-numbers" regexp=":fancy-number:\w:fancy-number:">
>     ...
>   </matcher>
>
> You need to have separate indexes to indicate which one you're talking
> about, plus some kind of syntax to pull out submatches within the
> named subexpression. Borrowing from XPath syntax (which might be a bad
> idea), you might have:
>
>   fancy-number[2]/*[2]
jep, had short internet-time juste before I left with sending this reply, it
crossed my mind later,
that indeed double reuse of one regex inside another one could occur, nice
to see there is already a syntax inside the world of xslt-awares that would
help out.

>
> to indicate the second subexpression of the second fancy-number
> subexpression in the matched string.
>
trying to catch it completely though:

you mean:
the *[index] is throwing all named subregexes on one array and getting the
second regardless it's name, right?

getting an actual parenthesis group out of a named subregex would be
different, no?
example of the nuance I'm seeing: how would I select the exponent-group out
of the second matched fancy-number in the folowing setting?

no sub-subregex's only parenthesis groups
<regex name="fancy-number">[0-9]+(\.[0-9]+)?([Ee][+-][0-9]+)?</regex>
<matcher name="two-numbers" regexp=":fancy-number:\w:fancy-number:">
     ...
	select-group="fancy-number[2]/2"
     ...
</matcher>

compared to:
<regex name="exponent">[Ee][+-][0-9]+</regex>
<regex name="fractalpart">\.[0-9]+</regex>
<regex name="fancy-number">[0-9]+:fractalpart:?:exponent:?</regex>
<matcher name="two-numbers" regexp=":fancy-number:\w:fancy-number:">
     ...
	select-group="fancy-number[2]/*[2]"
or	select-group="fancy-number[2]/exponent"
     ...
</matcher>

> Actually, that syntax isn't all that bad - you can imagine the matcher
> actually builds up a tree structure based on the subexpression
yep, need some more imagination before actually building it though :-)

> matches (you need 'anonymous' elements for unnamed subexpressions, but
> you should be able to get away with that using elements in some
> restricted namespace or something)...
mmm... don't understand how we could get unnamed subexpressions?
as far as I see now, we'ld need :name: to slice them in, no?

>
> > this also makes me think about your earlier mentioning of dynamic
> > regexes you probably expect anything that qualifies as a
> > text-representing xsl parameter to be possibly carrying part of the
> > regex to be executed...
>
> I think that if you could build the named regular expressions
> dynamically, then this idea would work fine. Going back to the keyword
> example that I used on an earlier mail, you could do:
>
> <xsl:regexp name="keyword-as-word"
>             select="concat('\W', $keyword, '\W')" />
>
> If named regular expressions were like variables, you could assign
> them values at the global or local level...
>
thx

> Cheers,
>
> Jeni
>
> ---
> Jeni Tennison
> http://www.jenitennison.com/
>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread