Subject: Re: Converting poorly formed HTML into well-formed XML From: "Steve Muench" <smuench@xxxxxxxxxxxxx> Date: Tue, 26 Sep 2000 16:55:50 -0700 |
| Does XSLT have the facilities to directly | read in the poorly formed HTML? No built-in features to do this. I'd recommend leveraging Andy Quick's excellent (open source) Java implementation of Dave Raggett's HTML "Tidy" utility called JTidy. http://www3.sympatico.ca/ac.quick/jtidy.html It can expose a DOM API to the "tidied-up" (that is, well-formed) XML tree for any ill-formed HTML document. You can then pass the DOM Document into your XSLT engine for transformation. In my about-to-be-released book "Building Oracle XML Applications" from O'Reilly, I had occasion to use this JTidy library to show readers how to take ill-formed HTML and use XSLT to "scrape" interesting data out of the "tidied"-up XML result from dynamic web pages like stock quote services or other online sources of information. ______________________________________________________________ Steve Muench, Lead XML Evangelist & Consulting Product Manager BC4J & XSQL Servlet Development Teams, Oracle Rep to XSL WG Author "Building Oracle XML Applications", O'Reilly http://www.oreilly.com/catalog/orxmlapp/ | Does XSLT have the facilities to directly read in the poorly formed HTML? | And if so, what needs to be done. | | Or, | | Will designing a custom parser that builds a DOM from the poorly formed HTML | to then be output to an XML file, or directly processed by an XSLT document, | be the best solution. | | I've already begun developing the latter (custom) solution, but thought I'd | double check to see if there are any HTML -> XHTML converters available. | | Thanks in advance for your help, | | Joe Fourness | | | XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list | XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Converting poorly formed HTML into , Joseph Fourness | Thread | text wrapping around callout/note/s, Robert Koberg |
Converting poorly formed HTML into , Joseph Fourness | Date | Re: Converting poorly formed HTML i, Raffaele Sena |
Month |