Re: [xsl] Re: Any Doc to XML converter ?

Subject: Re: [xsl] Re: Any Doc to XML converter ?
From: Peter Flynn <peter@xxxxxxxxxxx>
Date: Wed, 20 Jun 2001 23:51:22 +0100
On Tue, 19 Jun 2001, Dmitri wrote:
> Bob DuCharme wrote:
> > In his latest 'XML Deviant' column in
> > (, Leigh Dodds describes
> > and points to a recent thread on the topic.
> >From a recent MSDN article 'Export a Word Document to XML' by Kevin McDowell
> (
> 'The XML output by this application is very straightforward and very similar to the
> HTML output by Word itself, but it fully accounts for all styled text, tables, and
> lists. '

Which may very well be true, but the output is largely garbage.
This whole discussion misses the major points:

  1) Iff your Word document is formatted 100% exclusively with
     named styles, robust conversion to meaningful XML is easily
     possible with a number of packages, eg Enigma's DynaTag.

  2) If your Word document uses arbitrary manual styling, no
     amount of footling around with conversions is going to
     produce anything other than an XML-syntax'd representation
     of all the styles. You still have to undertake the hardest
     part, which is interpreting all the styling cruft into some
     meaningful markup. XSLT could certainly be used at this

This assumes you do want meaningful markup. If all you need is
the XML representation of the manual styling, then there are
several solutions already discussed.

It may be instructive that a someone last year wrote a short VB 
script to turn any DOC file into XML, extracting all the style 
info into a CSS stylesheet in a single pass...and it was written 
on a laptop in the bus on the way to the airport after a 
meeting. I'm sure it has long been superseded but this is not
rocket science.


 XSL-List info and archive:

Current Thread