Subject: Re: [xsl] NFA to DFA conversion using XSLT|
From: "Dimitre Novatchev" <dnovatchev@xxxxxxxxx>
Date: Fri, 1 Jun 2007 11:32:32 -0700
Perhaps XSLT is not the right language to implement parsers.
Actually my experience with the FXSL LR parsing Framework has convinced me in the opposite:
It is quite straightforward to implement parsers in XSLT 2.0 using the FXSL generic LR(1) table-driven parser and the parsing tables generated by YACCX.
The JSON parser was implemented in about 1-2 days (whenever I had free time -- so probably just several hours) and most of the effort was in the lexer and in the semantic evaluation -- things that the developer has to provide anyway using *any* such tool.
This can be used right at this moment. Some work, of course, remains, such as providing better error-localization information.
So, anyone is welcome to use this and I would greatly appreciate any remarks and comments.
I hope to have free time to write in my blog more about the FXSL LR parsing Framework.
BTW, does someone know of any RegEx optimizer tool, which will take your RegEx and rewrite it into a more efficient one? How does one know that the RegEx he has written is at least reasonably efficient?
-- Cheers, Dimitre Novatchev --------------------------------------- Truly great madness cannot be achieved without significant intelligence. --------------------------------------- To invent, you need a good imagination and a pile of junk ------------------------------------- You've achieved success in your field when you don't know whether what you're doing is work or play
Hi Mike, Thanks a lot for your reply, and insightful remarks. I now have some new things to ponder about.
After I have read something more about parsing theory, I'll have more questions.
Perhaps XSLT is not the right language to implement parsers. But to learn some concepts from the list, I had put my questions in the context of XSLT.
On 6/1/07, Michael Kay <mike@xxxxxxxxxxxx> wrote: > > > > I am toying with the following idea (a sample state table) to > > represent state table of NFA: > > Looks a perfectly reasonable representation. > > > I am > > following some examples and algorithms from the book, > > "Principles of compiler design - by Aho, Ullman). > > > > One of the algorithms for NFA to DFA conversion states, that > > we essentially follow the below two steps: > > 1) Compute e-CLOSURE (e stands for epsilon transitions) - > > which involves pushing & popping states from a stack. > > 2) The subset construction - which uses e-CLOSURE function to > > construct a DFA. > > I've spent many happy hours studying those three pages of the book, because > Saxon implements those algorithms in its schema processor. > > Doing the epsilon-closure of a state is quite easy, I think. Although the > algorithm given by Aho and Ullmann is procedural, it's not difficult to come > up with a (simpler) algorithm that's purely functional. > > By contrast, the subset construction algorithm doesn't intuitively translate > into functional form at all, I think you would have to devise a completely > different algorithm, and proving its correctness would not be at all easy. > For what it's worth, I implemented an "optimization" to the Aho and Ullman > algorithm and it took nearly three years before I discovered it was > incorrect - it ran all the 6000 or so test cases in the Schema test suite > without problems, but eventually tripped up on a very innocuous-looking > user-written content model. > > I haven't even looked at the state minimization algorithms. I don't see the > point: all the performance issues are to do with the speed of determinizing > the FSA, not with the size of the resulting DFA. Though that might be > because of the lengths XML Schema goes to to disallow ambiguous grammars; if > you were compiling regular expressions it might be more of an issue, I don't > know. > > Michael Kay > http://www.saxonica.com/
-- Regards, Mukul Gandhi