Subject: Re: [xsl] Re: A question about the expressive power and limitations of XPath 2.0 From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx> Date: Sun, 13 Jan 2002 15:14:07 +0000 |
Hi David, > I think that there are three separate problems that might be addressed: > > 1) defficiencies in the regular expression syntax/semantics. > This may or may not include lack of ^ and $ to match start and end of > expression or perl style {2} repeat clauses. (Mainly it's hard to > know what's there now as the text is a bit underspecified, hence my > "overlapping regexp" question) There are perl style {2} repeat clauses in the XML Schema regular expression language (http://www.w3.org/TR/xmlschema-2/#regexs) which I is what XPath 2.0 will be using (and which is why I've been escaping my {s). In the suggestion that I made, you wouldn't really need ^ and $, actually, because the static regexps always test the *entire string* (there's an implicit ^ and $) and if you use the tokenize() function you can always test whether the first string is '' (in which case it starts with your regular expression) and/or if there are only two items in your list (in which case it ends with the regular expression). However, I suspect that test(), match() and replace() functions will still be specified, and those do need ^ and $ to make them useful, I think. > 2) Possibilities for doing tree generation as well as string generation > once the match is found. (Note this is purely an XSLT construction > issue it doesn't affect the languages you accept, only what you can > do with them). This is where I came in with the regexp matching > template mechanism, and you've extended in various ways with named > subexpression possibilities. Yah. I think the named subexpressions are overkill :) I like the stuff that I wrote this morning a lot better. But the current-match() function could still give a tree representation of the match using rxp:match or whatever elements, as I suggested in a message to Marc recently. I don't know whether it's worth it - I kinda like the tree access 'cos it's easy to address and process trees, it'd be nicely expandable into named subexpressions some day, and I think it helps with regular expressions where there are lots of brackets that you really don't care about. On the other hand, it's a departure from what you get in other environments, so people used to Perl/emacs/sed or whatever might not like it. What do you think? We would have to address here the problem that Marc pointed out to do with how repeated subexpressions are captured... > 3) possibilities for accepting non regular languages in input strings. > three examples given so far in this thread, nested {} pairs, > html nested elements tag syntax, the classic non regular example > of a string consisting of a and b with as many a as b. I talked about the first two in what I wrote this morning. For the latter, I think you could use tokenize() to split the string up in two different ways: once on a and once on b, filter out the odd strings, and then compare the lengths of the two sequences: tokenize('abbaab', 'a') => ('', 'a', 'bb', 'a', '', 'a', 'b') ('', 'a', 'bb', 'a', '', 'a', 'b')[position() mod 2 = 0] => ('a', 'a', 'a') tokenize('abbaab', 'b') => ('a', 'b', '', 'b', 'aa', 'b') ('a', 'b', '', 'b', 'aa', 'b')[position() mod 2 = 0] => ('b', 'b', 'b') count(('a', 'a', 'a')) = count(('b', 'b', 'b')) => true I think that's a reasonable series of hoops to go through - the full expression is only: count(tokenize($string, 'a')[position() mod 2 = 0]) = count(tokenize($string, 'b')[position() mod 2 = 0]) It'd be nice to have even() and odd() functions :) Cheers, Jeni --- Jeni Tennison http://www.jenitennison.com/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Re: A question about the , David Carlisle | Thread | Re: [xsl] Re: A question about the , David Carlisle |
Re: Regular expression functions (W, Jeni Tennison | Date | RE: Entities Was: RE: [xsl] use cas, Bryan Rasmussen |
Month |