RE: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)

Subject: RE: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)
From: "Marc Portier" <mpo@xxxxxxxxxxxxxxxx>
Date: Thu, 10 Jan 2002 12:33:38 +0100
Hi Jeni,

> I can see the advantages to having the regular expressions close to
> the code that's generated from the regular expression - it makes it a
> lot easier to understand what's going on, especially if you're
> addressing sub-expressions.
>
> On the other hand, if you have a standard regular expression, perhaps
> something that you use in a lot of other regular expressions, it would
> be handy to have that regular expression stored somewhere separate.
> As a simple example, say I had a regular expression that matched
> numbers in scientific notation:
>
>   [0-9]+(\.[0-9]+)?([Ee][+-][0-9]+)?
>
> That's a bit of a mouthful to insert in all the regular expressions
> where I want to test that part of the string is a number in scientific
> notation. It would be handy if I could store that somewhere and just
> call on it as required.
>
> As I say, the problem with doing that is those ()s - I need to know
> what ()s are used where in order to tell what subexpressions I'm
> matching.
>
> This could be solved in two ways:
>
>   - introducing a syntax (to XML Schema regular expressions - perhaps
>     you already have it) for non-capturing matches
>   - introducing a syntax for naming the subexpressions rather than
>     numbering them

great example and perfect pointing to the () problem...
it kinda urges to the second solution if you ask me?

some
<regex name="fancy-number">[0-9]+(\.[0-9]+)?([Ee][+-][0-9]+)?</regex>

could then later be used inside
<matcher name="" regex="(other groups):fancy-number:(other groups)">
... while nested matchers or output-selecting elements could then use group
selections like
1.	<...	select-group="1"> ... or 2 refering to counting the parenthesis in
the scoped regex of this matcher
2.	<... select-group=":fancy-number:2" >
</matcher>

could be challenging to implement (spontanous idea of using the indexes as
offsets in counting parenthesis)



this also makes me think about your earlier mentioning of dynamic regexes
you probably expect anything that qualifies as a text-representing xsl
parameter to be possibly carrying part of the regex to be executed...

both features kinda require nesting of regexes, and would be expected to
clear out all side-effects of intersected parenthesis to be (in other words,
end users should still be able to count opening parenthesis to understand
what they want to cut out, the named 'sub'regex should introduce a prefix to
its own set of parenthesis)

recursively one would of course want to
select-group=":subrgx1:susbsubrgx11:3"

-marc=
>
> I just said all that to give you an idea about where I was coming from
> - I don't think that, at least in XSLT 2.0, this should necessarily be
> introduced because it's just a convenience (that leads to lots of
> other inconveniences!) rather than essential functionality.
>
> Cheers,
>
> Jeni
>
> ---
> Jeni Tennison
> http://www.jenitennison.com/
>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread