Re: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)

Subject: Re: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Mon, 14 Jan 2002 16:00:57 +0000
Hi Chris,

>>   - open up normal templates so that they can match things other than
>>     nodes
> What is wrong with that? A template that matches text is pretty much
> the end of the line anyway.

Nothing's wrong with that. I think it's a good idea generally (as I
think I've argued here before).  It's not something that's currently
allowed, though.

One of the reasons that David and I have been talking about:

>>   - introduce specific regexp templates

instead is, I think, that it makes it easier for the processor to
identify when it has to keep track of the results of a regular
expression. Not that you *couldn't* do this with the kind of pattern
syntax that you were suggesting, it's just that I think it would be a
bit more difficult.

>> Unfortunately, assuming greedy, (a)(b) would produce:
>>   <x>a)(b</x>
> Yeh but it doesn't have to be greedy.
> <xsl:template match="\((.*?)\)(.*)">
>         <x><xsl:apply-templates select=".[1]" /></x>
>         <xsl:apply-templates select=".[2]" />
> </xsl:template>
> Or
> <xsl:template match="\((.*?)\)">
>         <x><xsl:apply-templates select=".[1]" /></x>
>         <xsl:apply-templates select="$'" />
> </xsl:template>

...OK, now try running those templates with your original string
"(a(b(c)d)e)". I think that you just get:


Dimitre and David have kindly explained to me that regular expressions
cannot be used for nested structures such as these ones because a
language that permits nested structures are not regular languages, and
therefore cannot be described by regular expressions. (I think I got
that right - I'm sure I'll be corrected if I didn't ;)
[Just to point out that there's no syntax for non-greedy matches in
the XML Schema regular expression definition - I've posted to one of
the comments list about this (I think) but if you think it's a useful
thing to have, I'd recommend commenting to
www-xml-query-comments@xxxxxx as well.]

>> Or the other option is to have a special syntax to refer to a
>> regular expression, 
> You mean like text()['regexp']
> Which can't be confused with text()[normalize-space()]

This is equivalent to making there be a difference between:


One of the features of functions is that when you call them, the call
is replaced by the result of the call. So if the context size is 2,
then text()[last()] is equivalent to text()[2].

You could say that literal strings got special handling in predicates
in the same way that literal numbers get special handling in
predicates. So when you have a literal string in a predicate it's
equivalent to a call on a test() function of some sort, with the
context item as the first argument and the string as the second
argument. But this would fundamentally change what
text()[normalize-space()] actually meant, which I think is quite a
large backwards-incompatible change.

I was meaning something like (desperately trying to find a suitable
ASCII character...):


>> The way I (and I think David) was thinking, you'd use
>> current-match() or some other function to get information about the
>> subexpression matches when you were inside the template. So
>> perhaps:
>>   current-match()[x]
>> rather than .[x].
> Well if you like typing ;-)

Not particularly, but I do value consistency :) The other thing we'd
discussed was the implicit assignment of variables $1, $2 and so on -
if you don't like typing perhaps they're more your cup of tea.

I kinda decided I didn't like the idea because XSLT never implicitly
assigns values to variables anywhere else. I also thought that
current-match() fitted in better with current-group(), which does the
same kind of thing (adds something to the available context).



Jeni Tennison

 XSL-List info and archive:

Current Thread