RE: [xsl] Re: Any Doc to XML converter ?

Subject: RE: [xsl] Re: Any Doc to XML converter ?
From: "Joshua Allen" <joshuaa@xxxxxxxxxxxxx>
Date: Wed, 20 Jun 2001 17:41:07 -0700
http://msdn.microsoft.com/library/techart/odc_expwordtoxml.htm

produces very clean XML for me; in what sense is it "mostly garbage"?
You're not thinking of the "save as HTML" or whatever that is built-in,
are you?  You can flip on all sorts of extra options with this tool that
add more extra "garbage", but using the simple options faithfully
represents the structure and does a good job with scenario #1 that you
listed below.


> -----Original Message-----
> From: Peter Flynn [mailto:peter@xxxxxxxxxxx]
> Sent: Wednesday, June 20, 2001 3:51 PM
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: [xsl] Re: Any Doc to XML converter ?
> 
> On Tue, 19 Jun 2001, Dmitri wrote:
> > Bob DuCharme wrote:
> >
> > > In his latest 'XML Deviant' column in XML.com
> > > (http://www.xml.com/pub/a/2001/06/13/deviant.html), Leigh Dodds
> describes
> > > and points to a recent thread on the topic.
> >
> > >From a recent MSDN article 'Export a Word Document to XML' by Kevin
> McDowell
> > (http://msdn.microsoft.com/library/techart/odc_expwordtoxml.htm)
> >
> > 'The XML output by this application is very straightforward and very
> similar to the
> > HTML output by Word itself, but it fully accounts for all styled
text,
> tables, and
> > lists. '
> 
> Which may very well be true, but the output is largely garbage.
> This whole discussion misses the major points:
> 
>   1) Iff your Word document is formatted 100% exclusively with
>      named styles, robust conversion to meaningful XML is easily
>      possible with a number of packages, eg Enigma's DynaTag.
> 
>   2) If your Word document uses arbitrary manual styling, no
>      amount of footling around with conversions is going to
>      produce anything other than an XML-syntax'd representation
>      of all the styles. You still have to undertake the hardest
>      part, which is interpreting all the styling cruft into some
>      meaningful markup. XSLT could certainly be used at this
>      stage.
> 
> This assumes you do want meaningful markup. If all you need is
> the XML representation of the manual styling, then there are
> several solutions already discussed.
> 
> It may be instructive that a someone last year wrote a short VB
> script to turn any DOC file into XML, extracting all the style
> info into a CSS stylesheet in a single pass...and it was written
> on a laptop in the bus on the way to the airport after a
> meeting. I'm sure it has long been superseded but this is not
> rocket science.
> 
> ///Peter
> 
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread