Subject: RE: [xsl] How to Handle Bad XML (or Word HTML) From: "Joshua Allen" <joshuaa@xxxxxxxxxxxxx> Date: Tue, 11 Mar 2003 13:57:24 -0800 |
The best bet is to use HTML Tidy to tidy it up: http://tidy.sourceforge.net Tidy even has a mode for specifically for MS-Word. Also note that Word in Office 11 (currently in Beta 2) supports round-tripping of documents as well-formed XML. > -----Original Message----- > From: Ted Stresen-Reuter [mailto:tedmasterweb@xxxxxxx] > Sent: Tuesday, March 11, 2003 1:37 PM > To: xsl-List@xxxxxxxxxxxxxxxxxxxxxx > > Hi, > > Thanks again to everyone who answers on this list. You've all been > really sweet. > > Today's question hopes to try and tackle a transformation of the HTML > produced by MS Word into a valid XHTML format. > > In general, the problem is Word doesn't produce "valid" XML > (specifically, for many elements, attributes are not quoted). The file > I'm working with starts with the following: > > <html xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:w="urn:schemas-microsoft-com:office:word" > xmlns="http://www.w3.org/TR/REC-html40"> > > Additionally, a typical element might look like this: > > <p class=MsoNormal style='text-align:justify;mso-hyphenate:none'><![if > !supportEmptyParas]> <![endif]><o:p></o:p></p> > > Is it even possible to use such a document as a source document and if > so, how do I handle errors returned by the XSLT processor when unquoted > attributes are found? > > Thanks again to all of you who take the time to read and actually > answer these queries. > > Sincerely, > > Ted Stresen-Reuter > > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] How to Handle Bad XML (or, Koes, Derrick | Thread | RE: [xsl] How to Handle Bad XML (or, Passin, Tom |
RE: [xsl] How to Handle Bad XML (or, Koes, Derrick | Date | Re: [xsl] How to Handle Bad XML (or, J.Pietschmann |
Month |