Re: [xsl] library for parsing RTF

Subject: Re: [xsl] library for parsing RTF
From: Emmanuel Bégué <eb@xxxxxxxxxx>
Date: Wed, 30 Jun 2010 09:29:27 +0200
On Sun, Jun 27, 2010 at 11:45 PM, Andriy Gerasika
<andriy.gerasika@xxxxxxxxx> wrote:
>
>
> how about this one: http://rtf2xml.sourceforge.net/

This is very good and I've used it a lot; it tries to be very thorough
and succeeds most of the time.
Its drawbacks is that:
- it's slow (really slow for big files)
- it requires to install Python (which may or may not be a problem)
- the xml result is a little "thick"

For simpler tasks there is a simpler tool (very hard to find on Google
for some reason):
http://memberwebs.com/stef/software/rtfx/

It's written in C and can be built for any platform; it's very fast.
It doesn't try to return every single detail of the source file but I
find it sufficient for most needs (it correctly identifies titles,
lists, emphasis, and tables).

And of course you can always save rtf files in an OpenOffice format,
which is native XML (zipped); this can be done in batch if need be.

Be aware that since all these tools produce a different result using
their own schema, you need to choose your parser before writing your
transformations!

(Personnally and in hindsight, I'd try rtfx fist to see if it's good
enough for what you want to do; it's really the lightest approach of
all three).

- - -

This is if you need to read rtf; if you need to write rtf, then it's
quite easy to do directly from XSLT; I found this little book to be
very helpfull:
http://www.amazon.com/x/dp/0596004753/

Regards,
EB

Current Thread