Re: [xsl] library for parsing RTF

Subject: Re: [xsl] library for parsing RTF
From: Maurice Mengel <mauricemengel@xxxxxxxxx>
Date: Wed, 30 Jun 2010 09:01:29 +0200
And there are many versions of the RTF specification, some of which
seem in regular use. My impression is that the differences between
specifications add the Microsoft typical non-trivial complexity to
rtf.

Thanks for the link to rtf2xml. This looks good.



On Tue, Jun 29, 2010 at 9:51 PM, Kevin Grover <kevin@xxxxxxxxxxxxxxx> wrote:
>
> On Sun, Jun 27, 2010 at 16:07, Andriy Gerasika
> <andriy.gerasika@xxxxxxxxx> wrote:
> >>
> >> For a language as rich as RTF, regular expressions are not going to get
> >> you all that far: they are probably only suitable for writing the
> >> lexical analyzer (or tokenizer).
> >>
> >
> > RTF syntax is not that complex for requiring BNF parser.
> >
> > assuming the following RTF:
> > {\rtf1\ansi{\fonttbl\f0\fswiss Helvetica;}\f0\pard
> > This is some {\b bold} text.\par
> > }
> >
> > it can be easily converted w/ regular expressions to something like:
> >
<g><rtf>1</rtf><ansi/><g><fonttbl/><f>0</f><fswiss/>Helvetica<sc/></g><f>0</f
><pard/>
> > This is some <g><b/>bold</g> text.<par/>
> > </g>
> >
> > where "g" equals to RTF's curly braces(group) and "sc" to semicolon in
RTF.
> >
> > not sure if BNF parser will produce something better...
> >
>
> This seems about as useful as a regex C compiler, that compiles
>
> B  main() { printf ("Hello world!\n"); }
>
> and _nothing_ else.
>
> Just because you can make an regex for _one instanace_ of a grammer
> does not mean that you can (easily) use regexs to parse a generic
> format. B RTF is generic - there are MANY valid ways to say similiar
> things in RTF.

Current Thread