Subject: Re: [xsl] Are XPath expressions parsed using compiler parsing techniques?|
From: "John Lumley john@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 10 May 2022 10:07:17 -0000
An XPath expression is a query. Not against a database, but against an XML document. Are XPath expressions parsed using compiler parsing algorithms? Is a syntax tree constructed for an XPath expression? Is the syntax tree traversed?
* Using a parser-generator can get you running much more quickly and able to adapt to changes in the grammar much more rapidly, but diagnostics at the parse stage are at most very generic ("Expecting 'foo' at line 17, character 6, got 'bar'") and when you're running outside development you're running a much less time-efficient parser. * Writing a specific parser (assuming the grammar isn't full of ambiguity traps) takes much longer, but gives you much better control over the results and diagnostics. Error messages can be much better tailored ("The cardinality indicator for a sequence type must be one of *|+|? - provided '$') and concurrent type analysis, small-scale static error detection and small-scale optimisation (e.g. arithmetic constant expression reduction as Dimitre has shown) can be incorporated. But if the grammar changes drastically, you might have to do a lot of rework, dependent upon the architecture of your parser. (As an example adding some support for putative XPath4 operators (e.g. 'otherwise') needed some changes to the tokenizer and additional cases in some of the parse function switches, but where the new operator fitted into the class-based parser was fairly straightforward.)
*Don't build your parser without investing (heavily) in a test-driver and test-suite* - the more tests you can add, the better, even the really weird ones, which may eventually occur 'in nature'. (As Michael Kay hinted, 'or and not' is a valid XPath expression, only one of whose tokens is an operator - I think!)
-- *John Lumley* MA PhD CEng FIEE john@xxxxxxxxxxxx