RE: [xsl] How to Handle Bad XML (or Word HTML)

Subject: RE: [xsl] How to Handle Bad XML (or Word HTML)
From: "Joshua Allen" <joshuaa@xxxxxxxxxxxxx>
Date: Tue, 11 Mar 2003 13:57:24 -0800
The best bet is to use HTML Tidy to tidy it up:
http://tidy.sourceforge.net

Tidy even has a mode for specifically for MS-Word.

Also note that Word in Office 11 (currently in Beta 2) supports
round-tripping of documents as well-formed XML.

> -----Original Message-----
> From: Ted Stresen-Reuter [mailto:tedmasterweb@xxxxxxx]
> Sent: Tuesday, March 11, 2003 1:37 PM
> To: xsl-List@xxxxxxxxxxxxxxxxxxxxxx
> 
> Hi,
> 
> Thanks again to everyone who answers on this list. You've all been
> really sweet.
> 
> Today's question hopes to try and tackle a transformation of the HTML
> produced by MS Word into a valid XHTML format.
> 
> In general, the problem is Word doesn't produce "valid" XML
> (specifically, for many elements, attributes are not quoted). The file
> I'm working with starts with the following:
> 
> <html xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:w="urn:schemas-microsoft-com:office:word"
> xmlns="http://www.w3.org/TR/REC-html40";>
> 
> Additionally, a typical element might look like this:
> 
> <p class=MsoNormal style='text-align:justify;mso-hyphenate:none'><![if
> !supportEmptyParas]>&nbsp;<![endif]><o:p></o:p></p>
> 
> Is it even possible to use such a document as a source document and if
> so, how do I handle errors returned by the XSLT processor when
unquoted
> attributes are found?
> 
> Thanks again to all of you who take the time to read and actually
> answer these queries.
> 
> Sincerely,
> 
> Ted Stresen-Reuter
> 
> 
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> 


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread