Re: Converting poorly formed HTML into well-formed XML

Subject: Re: Converting poorly formed HTML into well-formed XML
From: "Steve Muench" <smuench@xxxxxxxxxxxxx>
Date: Tue, 26 Sep 2000 16:55:50 -0700
| Does XSLT have the facilities to directly 
| read in the poorly formed HTML?

No built-in features to do this.

I'd recommend leveraging Andy Quick's excellent (open source)
Java implementation of Dave Raggett's HTML "Tidy" utility called

It can expose a DOM API to the "tidied-up" (that is, well-formed)
XML tree for any ill-formed HTML document. You can then pass
the DOM Document into your XSLT engine for transformation.

In my about-to-be-released book "Building Oracle XML Applications"
from O'Reilly, I had occasion to use this JTidy library to show
readers how to take ill-formed HTML and use XSLT to "scrape" 
interesting data out of the "tidied"-up XML result from dynamic
web pages like stock quote services or other online sources of 

Steve Muench, Lead XML Evangelist & Consulting Product Manager
BC4J & XSQL Servlet Development Teams, Oracle Rep to XSL WG
Author "Building Oracle XML Applications", O'Reilly

| Does XSLT have the facilities to directly read in the poorly formed HTML?
| And if so, what needs to be done.
| Or,
| Will designing a custom parser that builds a DOM from the poorly formed HTML
| to then be output to an XML file, or directly processed by an XSLT document,
| be the best solution.
| I've already begun developing the latter (custom) solution, but thought I'd
| double check to see if there are any HTML -> XHTML converters available.
| Thanks in advance for your help,
| Joe Fourness
|  XSL-List info and archive:

 XSL-List info and archive:

Current Thread