Subject: Converting poorly formed HTML into well-formed XML From: Joseph Fourness <josephf@xxxxxxxxxxx> Date: Tue, 26 Sep 2000 15:56:20 -0700 |
Hello, I am currently developing a system that converts arbitrary poorly formed HTML into well formed XML (or XHTML). Example of HTML: <TD valign=TOP width="100"> <br> <A href="http://www.mulberrytech.com" target=_top>Link</a> The HTML has been written by various web developers over a period of time, so it is very inconsistent in formatting, use of quotation marks in attributes, etc. I need to convert these files (approx. 120,000) into XHTML for usability with an XSLT processor. Desired output: <td valign="top" width="100"> <br/> <a href="http://www.mulberrytech.com" target="_top">Link</a> Does XSLT have the facilities to directly read in the poorly formed HTML? And if so, what needs to be done. Or, Will designing a custom parser that builds a DOM from the poorly formed HTML to then be output to an XML file, or directly processed by an XSLT document, be the best solution. I've already begun developing the latter (custom) solution, but thought I'd double check to see if there are any HTML -> XHTML converters available. Thanks in advance for your help, Joe Fourness XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: special character, Lars Marius Garshol | Thread | Re: Converting poorly formed HTML i, Steve Muench |
Re: Q XSLT: How to copy elements wi, Joerg Colberg | Date | Re: Converting poorly formed HTML i, Steve Muench |
Month |