Re: [xsl] Re: A question about the expressive power and limitations of XPath 2.0

Subject: Re: [xsl] Re: A question about the expressive power and limitations of XPath 2.0
From: David Carlisle <davidc@xxxxxxxxx>
Date: Sun, 13 Jan 2002 12:54:29 GMT
> so you can't check that the name in the end tag is the same as the
> name in the start tag) Examples:

Jeni, If you mean here that you can't tell which start tag corresponds
to which end tag, that is not a deficiency in the currently specified
regular expression syntax, it's a statement that the language you are
trying to accept is not regular.

I think that there are three separate problems that might be addressed:

1) defficiencies in the regular expression syntax/semantics.
   This may or may not include lack of ^ and $ to match start and end of
   expression or perl style {2} repeat clauses. (Mainly it's hard to
   know what's there now as the text is a bit underspecified, hence my
   "overlapping regexp" question) 

2) Possibilities for doing tree generation as well as string generation
   once the match is found. (Note this is purely an XSLT construction
   issue it doesn't affect the languages you accept, only what you can
   do with them). This is where I came in with the regexp matching
   template mechanism, and you've extended in various ways with named
   subexpression possibilities.

3) possibilities for accepting non regular languages in input strings.
   three examples given so far in this thread, nested {} pairs,
   html nested elements tag syntax, the classic non regular example
   of a string consisting of a and b with as many a as b.

  Here you've suggested moving away from regular expressions to
  languages specified explicitly by giving grammars, in the style of
  lex. I'd hoped (but haven't been able to cleanly spec so far)
  to stay with just adding regexp functionality even in this case,
  as that is (often) enough to tokenise the input string, and to control
  the extra state information required to parse the tokens using
  existing XSLT control constructs.


Like Dimitre it's around a decade agao that I last thought about this
stuff for real and the precise definitions between all the various
classes of language can get very technical (and I can't remember
them:-), but the differences, especially between regular and non regular
languages can be  important as it makes precise which kinds of language
can be acepted by each system and which languages can not be accepted
just by tinkering with the control syntax and require a different
parsing technique altogether.

> Creating a regular expression that matches start and end tags in
> content (

So here for example you can create regexps that match start tags and end
tags separately, you can even create a regexp that will match a start
tag to its matching end tag so long as there are no more than 50 nested
subelements, but you can't do the general case with _any_ syntax for
regular expressions ('cause if you extend the syntax enough to do this
it isn't regular any more)

David

_____________________________________________________________________
This message has been checked for all known viruses by Star Internet
delivered through the MessageLabs Virus Scanning Service. For further
information visit http://www.star.net.uk/stats.asp or alternatively call
Star Internet for details on the Virus Scanning Service.

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread