Subject: Re: [xsl] Transforming the XSL-List archive into RSS 1.0|
From: "J.Pietschmann" <j3322ptm@xxxxxxxx>
Date: Thu, 03 Jul 2003 20:54:51 +0200
From: "Jimmy Cerra" However, when I looked at the source of the list, I noticed that the
pages are served as the SGML-flavor of HTML. :-( XSL can't really work
with this because of the unbalanced tags (<li>, <br>, etcetera).
However, the pages do validate as HTML 4.01 Strict. How do I work
around the unfortunate format and convert it to sensible XML?
The common approach is to run the HTML through tidy or whatever equivalent your toolset contains in order to convert it to XHTML. Check out jtidy for integration with Java based XSLT processors. Another approach is to use a HTML parser, like Xerces in HTML mode.