Subject: Re: [xsl] library for parsing RTF From: Maurice Mengel <mauricemengel@xxxxxxxxx> Date: Wed, 30 Jun 2010 09:01:29 +0200 |
And there are many versions of the RTF specification, some of which seem in regular use. My impression is that the differences between specifications add the Microsoft typical non-trivial complexity to rtf. Thanks for the link to rtf2xml. This looks good. On Tue, Jun 29, 2010 at 9:51 PM, Kevin Grover <kevin@xxxxxxxxxxxxxxx> wrote: > > On Sun, Jun 27, 2010 at 16:07, Andriy Gerasika > <andriy.gerasika@xxxxxxxxx> wrote: > >> > >> For a language as rich as RTF, regular expressions are not going to get > >> you all that far: they are probably only suitable for writing the > >> lexical analyzer (or tokenizer). > >> > > > > RTF syntax is not that complex for requiring BNF parser. > > > > assuming the following RTF: > > {\rtf1\ansi{\fonttbl\f0\fswiss Helvetica;}\f0\pard > > This is some {\b bold} text.\par > > } > > > > it can be easily converted w/ regular expressions to something like: > > <g><rtf>1</rtf><ansi/><g><fonttbl/><f>0</f><fswiss/>Helvetica<sc/></g><f>0</f ><pard/> > > This is some <g><b/>bold</g> text.<par/> > > </g> > > > > where "g" equals to RTF's curly braces(group) and "sc" to semicolon in RTF. > > > > not sure if BNF parser will produce something better... > > > > This seems about as useful as a regex C compiler, that compiles > > B main() { printf ("Hello world!\n"); } > > and _nothing_ else. > > Just because you can make an regex for _one instanace_ of a grammer > does not mean that you can (easily) use regexs to parse a generic > format. B RTF is generic - there are MANY valid ways to say similiar > things in RTF.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] library for parsing RTF, Kevin Grover | Thread | Re: [xsl] library for parsing RTF, Andriy Gerasika |
Re: [xsl] Determine position in nod, Dimitre Novatchev | Date | Re: [xsl] library for parsing RTF, Emmanuel Bégué |
Month |