Subject: RE: [xsl] Re: Any Doc to XML converter ? From: "Joshua Allen" <joshuaa@xxxxxxxxxxxxx> Date: Wed, 20 Jun 2001 17:41:07 -0700 |
http://msdn.microsoft.com/library/techart/odc_expwordtoxml.htm produces very clean XML for me; in what sense is it "mostly garbage"? You're not thinking of the "save as HTML" or whatever that is built-in, are you? You can flip on all sorts of extra options with this tool that add more extra "garbage", but using the simple options faithfully represents the structure and does a good job with scenario #1 that you listed below. > -----Original Message----- > From: Peter Flynn [mailto:peter@xxxxxxxxxxx] > Sent: Wednesday, June 20, 2001 3:51 PM > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: Re: [xsl] Re: Any Doc to XML converter ? > > On Tue, 19 Jun 2001, Dmitri wrote: > > Bob DuCharme wrote: > > > > > In his latest 'XML Deviant' column in XML.com > > > (http://www.xml.com/pub/a/2001/06/13/deviant.html), Leigh Dodds > describes > > > and points to a recent thread on the topic. > > > > >From a recent MSDN article 'Export a Word Document to XML' by Kevin > McDowell > > (http://msdn.microsoft.com/library/techart/odc_expwordtoxml.htm) > > > > 'The XML output by this application is very straightforward and very > similar to the > > HTML output by Word itself, but it fully accounts for all styled text, > tables, and > > lists. ' > > Which may very well be true, but the output is largely garbage. > This whole discussion misses the major points: > > 1) Iff your Word document is formatted 100% exclusively with > named styles, robust conversion to meaningful XML is easily > possible with a number of packages, eg Enigma's DynaTag. > > 2) If your Word document uses arbitrary manual styling, no > amount of footling around with conversions is going to > produce anything other than an XML-syntax'd representation > of all the styles. You still have to undertake the hardest > part, which is interpreting all the styling cruft into some > meaningful markup. XSLT could certainly be used at this > stage. > > This assumes you do want meaningful markup. If all you need is > the XML representation of the manual styling, then there are > several solutions already discussed. > > It may be instructive that a someone last year wrote a short VB > script to turn any DOC file into XML, extracting all the style > info into a CSS stylesheet in a single pass...and it was written > on a laptop in the bus on the way to the airport after a > meeting. I'm sure it has long been superseded but this is not > rocket science. > > ///Peter > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Re: Any Doc to XML conver, Pavek, Gary | Thread | RE: [xsl] Re: Any Doc to XML conver, Tim Watts |
Re: [xsl] Re: Any Doc to XML conver, Peter Flynn | Date | RE: [xsl] Re: Any Doc to XML conver, Tim Watts |
Month |