RE: html to xml

Subject: RE: html to xml
From: Sebastian Rahtz <sebastian.rahtz@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 26 Oct 2000 17:01:52 +0100
Joseph Kesselman/Watson/IBM writes:
 > 
 > >If your HTML is valid, you can try James Clark's tool SX
 > 
 > If it isn't valid HTML,  "tidy"  will clean it up... and then XMLify it, if
 > you use the right options. Tidy is available from the W3C's website.

hmm. having been fighting this tidy-then-transform system for the last
day or two, can anyone tell me how they solve two (related) problems?

 a) as we know, authors scatter <h1>, <h3> etc across their document
 like pointers. my target DTD needs structured divisions. who has some
 good XSLT code to sort it out? I have evolved a dirtyish solution,
 involing disable-output-escaping, but if someone else has a reliable
 clean system, I'd love to see it

 b) HTML allows PCDATA practically anywhere, so far as I can see. so
 I get

   <h3>Hello</h3>
   I am the walrus

 where my target DTD wants something more like

  <h3>Hello</h3>
  <p>I am the walrus

  How do others deal with this?

sebastian


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread