Re: [xsl] library for parsing RTF

Subject: Re: [xsl] library for parsing RTF
From: Kevin Grover <kevin@xxxxxxxxxxxxxxx>
Date: Tue, 29 Jun 2010 12:51:01 -0700
On Sun, Jun 27, 2010 at 16:07, Andriy Gerasika
<andriy.gerasika@xxxxxxxxx> wrote:
>>
>> For a language as rich as RTF, regular expressions are not going to get
>> you all that far: they are probably only suitable for writing the
>> lexical analyzer (or tokenizer).
>>
>
> RTF syntax is not that complex for requiring BNF parser.
>
> assuming the following RTF:
> {\rtf1\ansi{\fonttbl\f0\fswiss Helvetica;}\f0\pard
> This is some {\b bold} text.\par
> }
>
> it can be easily converted w/ regular expressions to something like:
>
<g><rtf>1</rtf><ansi/><g><fonttbl/><f>0</f><fswiss/>Helvetica<sc/></g><f>0</f
><pard/>
> This is some <g><b/>bold</g> text.<par/>
> </g>
>
> where "g" equals to RTF's curly braces(group) and "sc" to semicolon in RTF.
>
> not sure if BNF parser will produce something better...
>

This seems about as useful as a regex C compiler, that compiles

   main() { printf ("Hello world!\n"); }

and _nothing_ else.

Just because you can make an regex for _one instanace_ of a grammer
does not mean that you can (easily) use regexs to parse a generic
format.  RTF is generic - there are MANY valid ways to say similiar
things in RTF.

Current Thread