Subject: RE: Regular expression functions (Was: Re: [xsl] comments on December F&O draft) From: "Marc Portier" <mpo@xxxxxxxxxxxxxxxx> Date: Sat, 12 Jan 2002 12:38:38 +0100 |
Hi Jeni, > -----Original Message----- > From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx > [mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx]On Behalf Of Jeni Tennison > Sent: vrijdag 11 januari 2002 12:25 > To: Marc Portier > Cc: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: Re: Regular expression functions (Was: Re: [xsl] comments on > December F&O draft) > > > Hi Marc, > > > assume we have some :z: == (c1)(:x:){2} then the selection of index > > x[2] would have no meaning, since there is only one x noted in the > > regex > > > > and in normal regex behavior the numbered index 2 (2nd parenthesis) will > > only hold the second occurence of the :x: matching part of the > string... it > > is as writing (c1):x:(:x:) > > That's an interesting point. Assuming x matched 'c2', then that would > mean a structure of: > > <z> > <rxp:match>c1</rxp:match> > c2 > <x>c2</x> > </z> > (refering to the nested-regex vs nested-matcher discussion) I should check it out, but I'm really afraid the matchresult-groups[] here would actually be in the case of a (c1)(:x:){3} with :x: going for c2: [0] c1c2c2c2 [1] c2 (the last of the 3) and even the start-end positions would not be of more help... it's the regex engines way of saying you should write it differently if you want it to behave differently getting it into <z> <rxp:match>c1</rxp:match> c2c2 <x>c2</x> </z> leaving litle xpath-natural feeling for getting to 1st or 2nd 'c2'... which might be against natural xslt feelings? and it only gets worse when adding {n,m} kind of things in there :-( somewhere internally the regex engines need to know about the earlier matches though... different notations only tell it, it can forget about it... > > this is how regexes are working I'm afraid... (other hand, the > > notations :z: == (c1)(:x:)(:x:) and/or :z: == (c1)((:x:){2}) would > > possibly tackle what you really need) > > Yes - with the second of these, you would get something like: > > <z> > <rxp:match>c1</rxp:match> > <rxp:match> > c2 > <x>c2</x> > </rxp:match> > </z> > > which would at least allow you to get the result of the two xs > combined. yep. > > > oh and by the way, I started of this :subregex: notation, based on bad > > memory of long-past perl days > > just opened some doc again, and understand now that it used to be the > > [:name:] notation for the posix characters... with added > possible stuff like > > [:^name:] and the like > > Hmm... Perl uses that notation for named character classes. The > equivalent in the XML Schema regular expression language is roughly: > > \p(name) (characters in the named class) > \P(name) (characters not in the named class) > > That's a different kind of thing to what we're doing here (where the > named expressions are complete regular expressions rather than > character classes). I'd be tempted to introduce a different escape > character to do it, for example e (for expression): > > \e(name) (the named subexpression) > \E(name) (not the named subexpression, if that's appropriate?) > waw, great idea, sounds like something to propose/bounce off on some perl mailinglist as well... > So something like: > > \e(mantissa)\e(exponent)? > > > revoking my own introduction: maybe $name makes more sense in any > > case? > > Using $name in the regular expression might be confusing - you'd need > to make sure you could detect the end of the name, so probably ($name) > would be better. (I think that if $ is introduced as matching the end > of the string then you could safely state that it only matched the end > of the string if it was at the end of the regular expression.) > > So something like: > > ($mantissa)($exponent) > > I'd suggest {$name}, but only if regular expression support wasn't > ever available through functions (because {$name} looks a lot like an > AVT, and would make people think that they could put AVTs in > attributes that held expressions). > > If the references look like variable references then they should > probably be set with variable-binding elements (e.g. xsl:variable). yep, also assuming you read and go allong with the remark on parenthesis in these variables to be litterally matched as \( and \) ? and thus keep these next to the regexnesting with \e() > > Cheers, > > Jeni > > --- > Jeni Tennison > http://www.jenitennison.com/ > > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: Regular expression functions (W, Jeni Tennison | Thread | RE: Regular expression functions (W, Hunsberger, Peter |
RE: Regular expression functions (W, Marc Portier | Date | RE: Regular expression functions (W, Marc Portier |
Month |