RE: html to xml

Subject: RE: html to xml
From: Sebastian Rahtz <sebastian.rahtz@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 27 Oct 2000 10:31:29 +0100
Lisa van Gelder writes:
 > The basic problem is that the html you are getting is not structured enough
 > for your purposes.
 > 
 > I had the same problem, and solved it by setting rules for how the html
 > could be structured, so it could be converted into xml more easily. I do not
 > allow any text that is not surrounded by tags.

I was afraid someone would say that. My problem is that the task is to
convert our existing web pages (6196 documents, at last count) to (TEI DTD)
XML. So I have no control over the original coding. So the conclusion
is, I guess, "clean up the HTML minimally even before running tidy".

Sebastian


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread