Re: HTML to DocBook translation

Subject: Re: HTML to DocBook translation
From: Marcus Carr <mrc@xxxxxxxxxxxxxx>
Date: Fri, 06 Feb 1998 09:12:35 +1100
Thomas G. Lockhart wrote:

> > I want to translate some HTML documents into the DocBook format. I've
> > started with the following dsl:
> > Has someone a working (it must not be perfect) solution?
> I'm embarrassed to say that I started my one-time translation by writing a little perl script to do brute force pattern
> substitution, then hand-edited from there.
> Let me know if you come up with your much more elegant solution, or if you want me to send my pitifully inadequate one :)

I'll preface this by pointing out that I'm not intimately familiar with DSSSL, so I make no claims about whether this
approach is more or less appropriate than any other, just that we used it and the circumstances and scope of the project seem

We converted a large amount of legislation from HTML to a proprietary DTD recently. We used OmniMark to validate the HTML,
used about four incremental stages (also OmniMark) to get almost to where we wanted to be, then finished it by hand. These
types of conversions are never pretty, but the option of using pattern matching and rules based on element context minimises
the hassles.


