Subject: RE: html to xml From: Sebastian Rahtz <sebastian.rahtz@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> Date: Thu, 26 Oct 2000 17:01:52 +0100 |
Joseph Kesselman/Watson/IBM writes: > > >If your HTML is valid, you can try James Clark's tool SX > > If it isn't valid HTML, "tidy" will clean it up... and then XMLify it, if > you use the right options. Tidy is available from the W3C's website. hmm. having been fighting this tidy-then-transform system for the last day or two, can anyone tell me how they solve two (related) problems? a) as we know, authors scatter <h1>, <h3> etc across their document like pointers. my target DTD needs structured divisions. who has some good XSLT code to sort it out? I have evolved a dirtyish solution, involing disable-output-escaping, but if someone else has a reliable clean system, I'd love to see it b) HTML allows PCDATA practically anywhere, so far as I can see. so I get <h3>Hello</h3> I am the walrus where my target DTD wants something more like <h3>Hello</h3> <p>I am the walrus How do others deal with this? sebastian XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: html to xml, Joseph Kesselman/Wat | Thread | Elements to attributes, Peter Sparkes |
Re: dynamic name/value pairs on URL, Jeni Tennison | Date | Re: Saxon and MSXML3 differences, Jeni Tennison |
Month |