RE: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)

Subject: RE: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)
From: "Marc Portier" <mpo@xxxxxxxxxxxxxxxx>
Date: Fri, 11 Jan 2002 23:58:05 +0100
Hi Jeni,

> -----Original Message-----
> From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> [mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx]On Behalf Of Jeni Tennison
> Sent: vrijdag 11 januari 2002 11:44
> To: Marc Portier
> Cc: Steven Noels; xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: Regular expression functions (Was: Re: [xsl] comments on
> December F&O draft)
>
>
> Hi Marc,
>
> > you mean: the *[index] is throwing all named subregexes on one array
> > and getting the second regardless it's name, right?
>
> Yes.
>
> > getting an actual parenthesis group out of a named subregex would be
> > different, no?
>
> I don't think it has to be, if you use elements with some standard
> name to represent them...
>
> Say you had:
>
> <regex name="fancy-number">[0-9]+(\.[0-9]+)?([Ee][+-][0-9]+)?</regex>
> <matcher name="two-numbers" regexp=":fancy-number:\w:fancy-number:">
> ...
> </matcher>
>
> And you were matching the string:
>
>   "12.5 3.4E-2"
>
> I was imagining that you'd get built a tree that looked like
> (formatted for clarity - the only whitespace would actually be a
> single space between the two fancy-number elements):
>
>   <fancy-number>
>     12
>     <rxp:match>.5</rxp:match>
>   </fancy-number>
>   <fancy-number>
>     3
>     <rxp:match>.4</rxp:match>
>     <rxp:match>E-2</rxp:match>
>   </fancy-number>
>
> Where rxp is associated with some namespace like (for XPath anyway):
>
>   http://www.w3.org/2002/XPath/RegExp
>

great visualization of your view, understand clearly now, I was still on
some other path

thinking about doing it... the reality of the regex engine is that while you
would like the tree to appear somewhere in memory to select on, the way the
regex engine will offer you the matchresult will be more in an array like
format. e.g. for
- input: "3.4E-2"
- regex: [0-9]+(\.[0-9]+)?([Ee][+-][0-9]+)?
- matchresult would then be kind of an object that has a getGroups()
returning something like an array
that is holding
[0] = 3.4E-2
[1] = .4
[2] = E-2

the expected internal resulting state (for xpath selecting):
     3
     <rxp:match>.4</rxp:match>
     <rxp:match>E-2</rxp:match>

would require you to also use the start-end positions the regex engine needs
to remember with every group it matched (I guess most of them do, so should
be doable, but not trivial, calls maybe for optimizations that easily lures
people into writing new regex engines alltogether?)

reflecting (maybe wrong) what we did with the regexslt we pushed the tree
building into the nested matcher elements so we could stick with having the
regexes by themselves still do the string to array thing and then pop from
those arrays into trees, and possibly handing selected groups over to nested
submatchers...

the feeling I currently have is that this nested-regex to tree vision
directly has the power of circumventing the need for the nested matcher
structure alltogether... meaning with this kind of tree view on the
regex-match-result one can hand it over to the regular-and-known
xsl-node-juggling, I guess.

> So the values of the nodes selected by the following paths would be:
>
>   /                        =>  ("12.5 3.4E-2")
>   /fancy-number            =>  ("12.5", "3.4E-2")
>   /fancy-number[1]         =>  ("12.5")
>   /fancy-number[1]/node()  =>  ("12", ".5")
>   /fancy-number[1]/text()  =>  ("12")
>   /fancy-number[1]/*[1]    =>  (".5")
>   /fancy-number[1]/*[2]    =>  ()
>   /fancy-number[2]         =>  ("3.4E-2")
>   /fancy-number[2]/*       =>  (".4", "E-2")
>
>
> If you have named subexpressions within a named subexpression, that
> just changes the name of the element created for that subexpression.
> So if you had:
>
> <regex name="mantissa">[0-9]+(\.[0-9]+)?</regex>
> <regex name="exponent">[Ee][+-][0-9]+</regex>
> <regex name="fancy-number">:mantissa::exponent:?</regex>
> <matcher name="two-numbers" regexp=":fancy-number:\w:fancy-number:">
> ...
> </matcher>
>
> Matching the same string would give you a tree like:
>
>   <fancy-number>
>     <mantissa>12<rxp:match>.5</rxp:match></mantissa>
>   </fancy-number>
>   <fancy-number>
>     <mantissa>3<rxp:match>.4</rxp:match></mantissa>
>     <exponent>E-2</exponent>
>   </fancy-number>
>
> I should note that nothing existing in XPath or XSLT automatically
> creates a tree in this way.

And this would be an argument to keep the nested-matcher approach. They
behave more like the known templates that do create trees I guess?

> However, several EXSLT functions do (as a
> means of returning 'sequences', in fact!). I suspect that the
> introduction of user-defined functions in XSLT will lead to more
> functions that do this, but don't know whether people would feel it
> was acceptable for a built-in function.
>

no idea :-(

kind regards,
-marc=


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread