Subject: Re: [xsl] NFA to DFA conversion using XSLT From: "Mukul Gandhi" <gandhi.mukul@xxxxxxxxx> Date: Fri, 1 Jun 2007 23:26:52 +0530 
Hi Mike, Thanks a lot for your reply, and insightful remarks. I now have some new things to ponder about.
Perhaps XSLT is not the right language to implement parsers. But to learn some concepts from the list, I had put my questions in the context of XSLT.
> > I am toying with the following idea (a sample state table) to > represent state table of NFA:
Looks a perfectly reasonable representation.
> I am > following some examples and algorithms from the book, > "Principles of compiler design  by Aho, Ullman). > > One of the algorithms for NFA to DFA conversion states, that > we essentially follow the below two steps: > 1) Compute eCLOSURE (e stands for epsilon transitions)  > which involves pushing & popping states from a stack. > 2) The subset construction  which uses eCLOSURE function to > construct a DFA.
I've spent many happy hours studying those three pages of the book, because Saxon implements those algorithms in its schema processor.
Doing the epsilonclosure of a state is quite easy, I think. Although the algorithm given by Aho and Ullmann is procedural, it's not difficult to come up with a (simpler) algorithm that's purely functional.
By contrast, the subset construction algorithm doesn't intuitively translate into functional form at all, I think you would have to devise a completely different algorithm, and proving its correctness would not be at all easy. For what it's worth, I implemented an "optimization" to the Aho and Ullman algorithm and it took nearly three years before I discovered it was incorrect  it ran all the 6000 or so test cases in the Schema test suite without problems, but eventually tripped up on a very innocuouslooking userwritten content model.
I haven't even looked at the state minimization algorithms. I don't see the point: all the performance issues are to do with the speed of determinizing the FSA, not with the size of the resulting DFA. Though that might be because of the lengths XML Schema goes to to disallow ambiguous grammars; if you were compiling regular expressions it might be more of an issue, I don't know.
Michael Kay http://www.saxonica.com/
 Regards, Mukul Gandhi
Current Thread 


< Previous  Index  Next > 

RE: [xsl] tokenize() a string and t, Bjorndahl, Brad  Thread  Re: [xsl] NFA to DFA conversion usi, Dimitre Novatchev 
RE: [xsl] tokenize() a string and t, Mario Madunic  Date  Re: [xsl] NFA to DFA conversion usi, Mukul Gandhi 
Month 