Subject: Re: [xsl] library for parsing RTF From: Dimitre Novatchev <dnovatchev@xxxxxxxxx> Date: Sun, 27 Jun 2010 14:54:59 -0700 |
> Dimitre Novatchev seems to be the expert on writing parsers in XSLT. Perhaps his next project could be a parser-generator (aka compiler-compiler) - a > program that takes a BNF description of the grammar you want to parse, and generates an XSLT stylesheet/library to do the parsing. At the time I considered this and decided against it. Too much effort for an application that would be used very rarely -- at design time only. Once this tool creates the required parsing tables for the generic LR1 parser, it is never used at runtime. Suitable compiler-compiler systems already exist that can be used. I actually modified Berkeley YACC. The modification works exactly as the original system with a single addition. The original functionality is to accept a BNF description of the grammar you want to parse, and to generate parsing tables for input at run-time to a general table-driven LR1 parser. With my addition it now has an option to output these parsing tables in XML format, so that the XSLT implementation of the generic parser can use them as input. The tool is called YACCX and has been available for download for a few years from the FXSL CVS. If someone is interested to see how the parsing tables generated by YACCX look in XML format, here is a link to the parsing tables for JSON: http://fxsl.cvs.sourceforge.net/viewvc/fxsl/fxsl-xslt2/data/parseTables-J ason.xml?revision=1.1&view=markup The parser for JSON is here: http://fxsl.cvs.sourceforge.net/viewvc/fxsl/fxsl-xslt2/f/func-json-documen t.xsl?revision=1.11&view=markup Of notable interest is how the generic parser (f:lrParse() ) is used: 19 <xsl:variable name="vparseResult"> 20 <xsl:sequence select= 21 "f:lrParse($vJasonPPTables, 22 $pstrJson, 23 f:lexer-JSON(), 24 f:OnJSONRuleReduced() 25 ) 26 /computedValue/node() 27 " 28 /> 29 </xsl:variable> and also the RegEx used by the lexical analyzer -- look at $vRegExJSON defined in lines: 236 - 256: 236 <xsl:variable name="vRegExJSON" as="xs:string"> 237 ([\s]*) <!-- Skip leading whitespace --> 238 <!-- Followed by: --> 239 ( 240 ("[^"\\]*( ( ((\\[\\/bfnrt"]) | (\\u([0-9A-Fa-f]{4})) )[^"\\]*)*")) <!-- A string --> 241 | <!-- Or a Number --> 242 ((([-]?[0-9]+)?\.)?[-]?[0-9]+([eE][-+]?[0-9]+)? ) 243 | 244 ((true)|(false)|(null) <!-- Or true 245 or false or null --> 246 ) 247 248 | 249 ([{},:\[\]]) <!-- Or one of these: 250 '{', '}', ':', 251 '[', ']' --> 252 253 ) <!-- These are all our token types --> 254 (.*)$ <!-- Only get the first token, 255 Skip the rest for the future --> 256 </xsl:variable> The parsing tables and parser/lexical analyzer for XPath 2.0 are also available for anyone interested. Be warned that they are much bigger and way too complex. -- Cheers, Dimitre Novatchev --------------------------------------- Truly great madness cannot be achieved without significant intelligence. --------------------------------------- To invent, you need a good imagination and a pile of junk ------------------------------------- Never fight an inanimate object ------------------------------------- You've achieved success in your field when you don't know whether what you're doing is work or play On Sun, Jun 27, 2010 at 2:12 PM, Michael Kay <mike@xxxxxxxxxxxx> wrote: > >> there is no library and it is not required: >> since RTF is a textual format, you can use XSLT 2.0 regexp capabilities to parse RTF > > For a language as rich as RTF, regular expressions are not going to get you all that far: they are probably only suitable for writing the lexical analyzer (or tokenizer). > > Dimitre Novatchev seems to be the expert on writing parsers in XSLT. Perhaps his next project could be a parser-generator (aka compiler-compiler) - a program that takes a BNF description of the grammar you want to parse, and generates an XSLT stylesheet/library to do the parsing. > > Michael Kay > Saxonica
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] library for parsing RTF, Michael Kay | Thread | Re: [xsl] library for parsing RTF, Andriy Gerasika |
Re: [xsl] library for parsing RTF, Andriy Gerasika | Date | Re: [xsl] library for parsing RTF, Andriy Gerasika |
Month |