Subject: Re: [xsl] Grammars for XPath 2.0: which to use? From: "Dimitre Novatchev" <dnovatchev@xxxxxxxxx> Date: Fri, 13 Jul 2007 06:40:20 -0700 |
Although in the book I was definitely writing for users of the language rather than parser-writers, I didn't want to depart too far from the published grammar, so these compound symbols appear as <cast as>, which I think is actually quite a good compromise, though you need to read the accompanying text to see that you're actually allowed to have a comment in the middle of it.
-- Cheers, Dimitre Novatchev --------------------------------------- Truly great madness cannot be achieved without significant intelligence. --------------------------------------- To invent, you need a good imagination and a pile of junk ------------------------------------- You've achieved success in your field when you don't know whether what you're doing is work or play
I don't think there were many significant grammar changes after the book was printed. Only one or two minor ones like changing empty() to empty-sequence(). There may also have been a few clarifications of lexical rules, for example the fact that (10div 3) is illegal - there must be a space between "10" and "div". (This question arose with (if($X)then 10else 20) where the "e" can be read as part of a numeric literal).
At the time I wrote the book, the draft spec was still using compound symbols like <"cast" "as">. These subsequently came out, as a result of a decision to present a spec that was more a description of the legal sentences in the language and less a recipe for writing a parser. Although in the book I was definitely writing for users of the language rather than parser-writers, I didn't want to depart too far from the published grammar, so these compound symbols appear as <cast as>, which I think is actually quite a good compromise, though you need to read the accompanying text to see that you're actually allowed to have a comment in the middle of it.
Some of the complexity in the spec, especially the Note you reference (which was at one time part of the spec) arises from XQuery, which adds quite a few complications to the already-complicated rules for XPath. I think it's true that in XPath, unlike XQuery, you can tokenize without knowledge of the grammatical context. The Saxon parser does a "raw" tokenization which for XPath is essentially context-free, and then adds some processing between the lexer and the syntax analyzer which essentially classifies tokens more precisely based on the immediately preceding and following tokens - so there's a separation between the two traditional tasks of a lexer, splitting the text into tokens and classifying the tokens. But in other cases, for example the distinction between "+" as an operator and "+" as an occurrence indicator, it's left to the syntax analyzer to distinguish them.
Michael Kay http://www.saxonica.com/
> -----Original Message----- > From: Dimitre Novatchev [mailto:dnovatchev@xxxxxxxxx] > Sent: 13 July 2007 05:12 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: [xsl] Grammars for XPath 2.0: which to use? > > Recently I've been having fun with parsing context-free > languages using a general parser for LR languages, written in > XSLT 2.0. > > The first and easier language was JSON, leading to the > addition of two new functions to FXSL: > f:json-document() > and > f:json-file-document() > > as reported in this list and in my blog. > > The second language I played with was XPath. As I mentioned > earlier in this list, it was almost straightforward and > non-problematic to create a working parser (right now > constructing just a parse tree for an XPath expression). The > reason for this easiness is that Dr. Kay's XPath 2.0 book is > an excellent reference material both in describing the > terminal symbols (lexical tokens) of the language and its grammar. > > My question is whether the XPath 2.0 grammar as described in > the book is still equivalent to the one described in the > XPath 2.0 recommendation (http://www.w3.org/TR/xpath20/#id-grammar) > > or if there are any differences? > > Certainly, I could try comparing both grammars myself, but > why not ask and get this valuable information straight from > the horse's mouth? I believe this is also valuable to the > readers of xsl-list. > > > As the official W3 XPath 2.0 recommendation is not so easy to > read as Dr. Kay's book, I would prefer to be able to continue > using the grammar from his book (possibly with appropriate > modifications). > > The same question can be asked about the definition of the > terminal symbols. Here we have: > > 1. Dr. Kay's book. > > 2. The official W3 XPath 2.0 recommendation > (http://www.w3.org/TR/xpath20/#terminal-symbols) > > 3. A seemingly outdated W3 document "Building a Tokenizer > for XPath or XQuery" (http://www.w3.org/TR/xquery-xpath-parsing/) > > In implementing the lexical scanner (again in pure XSLT 2.0) > I again used Dr. Kay's book (1), found (2) quite confusing, > and definitely decided not to use any of the approaches > described in (3). It might be interesting to know that > determining the next terminal symbol can be accomplished > based on a the evaluation of a single regular expression > (shall I call this "one-pass approach" ?). > > -- > Cheers, > Dimitre Novatchev > --------------------------------------- > Truly great madness cannot be achieved without significant > intelligence. > --------------------------------------- > To invent, you need a good imagination and a pile of junk > ------------------------------------- > You've achieved success in your field when you don't know > whether what you're doing is work or play
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Grammars for XPath 2.0: w, Michael Kay | Thread | [xsl] Clientside XSLT Transformatio, Karl Stubsjoen |
Re: [xsl] > replaced by ">", <, Jethro Borsje | Date | Re: [xsl] Clientside XSLT Transform, Karl Stubsjoen |
Month |