Subject: RE: [xsl] html2xml? From: naha@xxxxxxxxxx Date: Wed, 27 Mar 2002 08:54:02 -0500 (EST) |
Quoting Jarno.Elovirta@xxxxxxxxx: > Hi, > > > Has anyone done html to xml transformation? > > Is it possible? If yes...how? A small example would be great =) > > Run the HTML document throught Tidy/JTidy/SX/OpenXML and then process > like a normal XML document. I recently tried Tidy (http://www.w3.org/People/Raggett/tidy/) for this but found it overly-aggressive in its enforcement of the HTML DTD. For example, it transformed <a href="some-url"> <div class="style">anchor text</div> </a> into <a href="some-url"> </a> <div class="style">anchor text</div> which affects the semantics of the document. I've not found a configuration parameter to control this behavior. Wouldn't it be more correct to transform to <div class="style"> <a href="some-uri">anchor text</a> </div> I'm not familiar with any of the other suggested tools. I was originally hoping for an all-XSL solution to my problem, but since it involves capturing and processing a tree (more like a shrub) of crossreferenced web pages, all of which need to be HTML->XML converted first, I've started writing a Java program for this. I was hoping to use the HEX parser (http://www-uk.hpl.hp.com/people/sth/java/hex.html) but the version I fetched appears to be buggy and the author's email address is no longer valid. I'm unaware of the other converters you suggested. Google found whao are apparently two different "OpenXML"s, one written in Java and one in Delphi. Could you provide a URL to the one you suggested? The only information I found about the Java one was on CNET (http://download.cnet.com/downloads/0-14492-100-5565652.html) and the site it refers to as the "publisher" (http://www.openxml.org/) seems to be a shopping site. This topic would be a great candidate for a FAQ. I didn't find one on Dave Pawson's site. XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] html2xml?, Jarno . Elovirta | Thread | RE: [xsl] how to change xml ENCODIN, Jarno . Elovirta |
Re: [xsl] sequence numbering., Jeni Tennison | Date | Re: [xsl] msxsl.exe doesn't show xs, Corey_Haines |
Month |