Re: [xsl] Re: Any Doc to XML converter ?

Subject: Re: [xsl] Re: Any Doc to XML converter ?
From: "Michael Beddow" <mbnospam@xxxxxxxxxxx>
Date: Tue, 19 Jun 2001 19:55:15 +0100
I don't enjoy defending MS, but in all fairness:

> But from my perspective that is grossly misleading. The HTML that
> Word exports is trash full of Microsoft specific extensions that
> most web sites don't want. So if they're XML is similar, that's
> not saying much.

Kevin McDowell's MSDN article admits that. Though I don't see why
"Microsoft specific extensions" (= namespaces??) in XML should be any
worse than anybody else's duly defined namespaces and/or elements (e.g.
those used in OpenOffice XML output). And they do their job: full
two-way tripping between Word and HTML without loss of information or
formatting. Users wanted that, and they've now got it. It could be done
more cleanly (OpenOffice does so) but what couldn't?

The trash in the "save as HTML"  output from Word 2K  lies in crass
goofs like unquoted attributes etc etc. The whole point is that McDowell
describes a method of getting much better results by another route. I
personally don't really want to go down that route, but I'm glad to see
it on offer. Like Bob, I can't get the example code to work reliably on
anything other than the example data, but that's not all that unusal.
With a bit more work, the routines in the cited article could probably
be made to output XML that was more or less as you wished. So credit
where it's due, I say...

Michael Beddow
XML and the Humanities page:

----- Original Message -----
From: <sara.mitchell@xxxxxxxxx>
To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Tuesday, June 19, 2001 6:35 PM
Subject: RE: [xsl] Re: Any Doc to XML converter ?

> Well, I understand why Microsoft thinks this (although I violently
> disagree):
> > From a recent MSDN article "Export a Word Document to XML" by
> > Kevin McDowell
> > (
> >
> > "The XML output by this application is very straightforward
> > and very similar to the
> > HTML output by Word itself, but it fully accounts for all
> > styled text, tables, and
> > lists. "
> >
> > and
> >
> > "Conclusion
> > This solution provides a starting point to build an XML
> > parser for Word documents.
> > In addition to the XML functionality, it discusses how to
> > build custom objects to
> > handle sequential instances of all styles and graphics and
> > how to loop through
> > tables and lists. Remember, documents shouldn't be converted
> > to XML merely for the
> > sake putting them in XML. The best document to convert to XML
> > is one that makes use
> > of styles and will be reused in other ways."
> And it --completely-- ignores something even more fundamental. Which
> is that most people using Word to create documents could care less
> about good structure, consistency, and much of the modelling that
> makes information truly reusable. If you start with trash, guess what
> you end up? So the XML from Word isn't going to get people the benefit
> they think (and that this article implies).
> Sara Mitchell
>  XSL-List info and archive:

 XSL-List info and archive:

Current Thread