RE: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)

Subject: RE: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)
From: "Marc Portier" <mpo@xxxxxxxxxxxxxxxx>
Date: Fri, 11 Jan 2002 01:59:39 +0100
Hi Peter,

> If a "matcher" explicitly returns a tree structure you could view it as
> sending the results to the output document.  Thus, wrapping it in
> a variable
> would allow you to manipulate the results in a natural  (XSL at
> least) way:
> 	<xsl:variable name="gunk">
> 	  	<xsl:apply-templates
> select-regexp=":fancy-number:\w:fancy-number:" />
> 	</xsl:variable>
> 	<xsl:value-of select="$gunk[2]/*[2]"/>
> (The template might actually be doing a "match-regexp", but to keep things
> concise let's pretend the example is complete :-).
> This seems to minimize the need to invent new stuff? Adding a similar form
> of regexp qualifier/attribute to copy and copy-of would seem to handle all
> general cases, no?

it does indeed, some questions/2nd thoughts
in which context would this be usefull?
on what inputstring would the regex be matched? the value of . (current
node/list/string)? it's serialized version?

as for the syntax, from a unique identifying view it should in the given
example at least be naming the subregex used, so more like
$gunk/fancy-number[2]/ or

as for the nature of how matchers work, there was an earlier remark on
regex's not suitably returning trees, and indeed, my finding is that they
are rather returning tables in which the rows are counted by every match,
and the columns are counted by every group within that match...

<regex name="x">(a1)out(a2(a3))</regex>
<regex name="y">(b1):x:(:x:)?</regex>

then a match for regex=":x:" would kinda hold a result that looks like
(rowindexes are counted matches, col indices are counted parenthesis groups
in that match)
	c#0		c#1	c#2	c#3
r#1   a1outa2x3	a1	a2a3	a3

the named-subregex idea jeni introduced indeed gives a feeling of hierarchy,
but still would map more onto a table in essence... I don't fully grasp what
that would do to your idea of using it in the output tree, in every case the
result *could* be visioned like this if we match for :y: now
fullregex == (b1)(a1)out(a2(a3))((a1)out(a2(a3)))?	--> 8 parenthesis groups
	c#0				c#1	c#2	c#3	c#4	c#5		c#6	c#7	c#8
r#1	b1a1outa2a3			b1	a1	a2a3	a3
r#2	b1a1outa2a3a1outa2a3	b1	a1	a2a3	a3	a1outa2a3	a1	a2a3	a3

however, since we assume the user can't count for the hidden parenthesis
groups in the named :x:, we kinda introduced the hierarchical notation that
maps the col-nrs to more hierarchically looking:
assuming notedregex == (y1):x:(:x:)

point is how you note the expression would define which indexnotations are
assume we have some :z: == (c1)(:x:){2}
then the selection of index x[2] would have no meaning, since there is only
one x noted in the regex

and in normal regex behavior the numbered index 2 (2nd parenthesis) will
only hold the second occurence of the :x: matching part of the string... it
is as writing (c1):x:(:x:)

this is how regexes are working I'm afraid... (other hand, the notations :z:
== (c1)(:x:)(:x:) and/or :z: == (c1)((:x:){2}) would possibly tackle what
you really need)

dunnow if all of this makes sense,
finding something that gives a natural feel to both regex and xslt savvy
will not be easy.

oh and by the way, I started of this :subregex: notation, based on bad
memory of long-past perl days
just opened some doc again, and understand now that it used to be the
[:name:] notation for the posix characters... with added possible stuff like
[:^name:] and the like

revoking my own introduction: maybe $name makes more sense in any case?  and
a nicer fit to the $keyword example Jeni gave.
(not to mention we wouldn't need <regex> but could just reuse
 and thus re-opening the discussion for dealing with the hidden () in the
name for the end-user: is it indeed so logical to do so? how is perl doing
it when you slide in a $var in a regex? count through on the $1,2,3,...
indices probably)


 XSL-List info and archive:

Current Thread