Subject: Re: html to xml From: David Carlisle <davidc@xxxxxxxxx> Date: Fri, 27 Oct 2000 11:02:23 GMT |
> So the conclusion > is, I guess, "clean up the HTML minimally even before running tidy". > I was afraid someone would say that. My problem is that the task is to > convert our existing web pages (6196 documents, at last count) to (TEI DTD I wasn't sure quite what your context was. Surely grabbing floating PCDATA and sticking it in a paragraph element is something easily done in the post tidy XSL transformation to TEI. Grabbing html section heads into TEI/docbook style section containers is always a pain but you can do it in XSL with the usual "grouping" techniques. It's made a bit easier if you know that the H? elements all appear in "correct" sequence, not jumping from h1 to h3. If you use ISO-HTML DTD then the SGML parser (eg sx ) will add any missing section levels automagically if you set the appropriate parameter entity. David XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: html to xml, Sebastian Rahtz | Thread | Re: html to xml, Sebastian Rahtz |
Re: XML to PDF, David Carlisle | Date | RE: Entity Reference Question, Kay Michael |
Month |