Subject: RE: html to xml From: Sebastian Rahtz <sebastian.rahtz@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> Date: Fri, 27 Oct 2000 10:31:29 +0100 |
Lisa van Gelder writes: > The basic problem is that the html you are getting is not structured enough > for your purposes. > > I had the same problem, and solved it by setting rules for how the html > could be structured, so it could be converted into xml more easily. I do not > allow any text that is not surrounded by tags. I was afraid someone would say that. My problem is that the task is to convert our existing web pages (6196 documents, at last count) to (TEI DTD) XML. So I have no control over the original coding. So the conclusion is, I guess, "clean up the HTML minimally even before running tidy". Sebastian XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: html to xml, Lisa van Gelder | Thread | Re: html to xml, David Carlisle |
XSLT: SUM function or "+" operator , Albert Tsun | Date | Re: Entity Reference Question, Miloslav Nic |
Month |