RE: [xsl] How to Handle Bad XML (or Word HTML)

Subject: RE: [xsl] How to Handle Bad XML (or Word HTML)
From: "Koes, Derrick" <Derrick.Koes@xxxxxxxxxxxxxxxx>
Date: Tue, 11 Mar 2003 16:54:01 -0500
http://www.w3.org/People/Raggett/tidy/

Have you tried using tidy?


-----Original Message-----
From: Ted Stresen-Reuter [mailto:tedmasterweb@xxxxxxx] 
Sent: Tuesday, March 11, 2003 4:37 PM
To: xsl-List@xxxxxxxxxxxxxxxxxxxxxx
Subject: [xsl] How to Handle Bad XML (or Word HTML)

Hi,

Thanks again to everyone who answers on this list. You've all been 
really sweet.

Today's question hopes to try and tackle a transformation of the HTML 
produced by MS Word into a valid XHTML format.

In general, the problem is Word doesn't produce "valid" XML 
(specifically, for many elements, attributes are not quoted). The file 
I'm working with starts with the following:

<html xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns="http://www.w3.org/TR/REC-html40";>

Additionally, a typical element might look like this:

<p class=MsoNormal style='text-align:justify;mso-hyphenate:none'><![if 
!supportEmptyParas]>&nbsp;<![endif]><o:p></o:p></p>

Is it even possible to use such a document as a source document and if 
so, how do I handle errors returned by the XSLT processor when unquoted 
attributes are found?

Thanks again to all of you who take the time to read and actually 
answer these queries.

Sincerely,

Ted Stresen-Reuter


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
This electronic transmission is strictly confidential to Smith & Nephew and
intended solely for the addressee.  It may contain information which is
covered by legal, professional or other privilege.  If you are not the
intended addressee, or someone authorized by the intended addressee to
receive transmissions on behalf of the addressee, you must not retain,
disclose in any form, copy or take any action in reliance on this
transmission.  If you have received this transmission in error, please
notify the sender as soon as possible and destroy this message.

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread