RE: html to xml

Subject: RE: html to xml
From: "Lisa van Gelder" <lisa@xxxxxxxxxxxxxxxxx>
Date: Fri, 27 Oct 2000 10:09:56 +0100
> a) as we know, authors scatter <h1>, <h3> etc across their
>document
> like pointers. my target DTD needs structured divisions.

> b) HTML allows PCDATA practically anywhere, so far as I can
>see. so
> I get
>   <h3>Hello</h3>
>   I am the walrus

The basic problem is that the html you are getting is not structured enough
for your purposes.

I had the same problem, and solved it by setting rules for how the html
could be structured, so it could be converted into xml more easily. I do not
allow any text that is not surrounded by tags.

It depends what you are trying to do, and how much say you have over the
html that is created.

Lisa


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread