Re: HTML to DocBook translation

Subject: Re: HTML to DocBook translation
From: Marcus Carr <mrc@xxxxxxxxxxxxxx>
Date: Fri, 06 Feb 1998 09:12:35 +1100
Thomas G. Lockhart wrote:

> > I want to translate some HTML documents into the DocBook format. I've
> > started with the following dsl:
> > Has someone a working (it must not be perfect) solution?
> I'm embarrassed to say that I started my one-time translation by writing a little perl script to do brute force pattern
> substitution, then hand-edited from there.
> Let me know if you come up with your much more elegant solution, or if you want me to send my pitifully inadequate one :)

I'll preface this by pointing out that I'm not intimately familiar with DSSSL, so I make no claims about whether this
approach is more or less appropriate than any other, just that we used it and the circumstances and scope of the project seem

We converted a large amount of legislation from HTML to a proprietary DTD recently. We used OmniMark to validate the HTML,
used about four incremental stages (also OmniMark) to get almost to where we wanted to be, then finished it by hand. These
types of conversions are never pretty, but the option of using pattern matching and rules based on element context minimises
the hassles.


Marcus Carr                  email:  mrc@xxxxxxxxxxxxxx
Allette Systems (Australia)  email:  info@xxxxxxxxxxxxxx
Level 10, 91 York Street     www:
Sydney 2000 NSW Australia    phone:  +61 2 9262 4777
                             fax:    +61 2 9262 4774

 DSSSList info and archive:

Current Thread
  • HTML to DocBook translation
    • Christian Leutloff - from mail1.ability.netby (8.8.5/8.6.12) with ESMTP id PAA12945Wed, 4 Feb 1998 15:01:20 -0500 (EST)
      • Thomas G. Lockhart - from mail1.ability.netby (8.8.5/8.6.12) with ESMTP id VAA15068Wed, 4 Feb 1998 21:33:51 -0500 (EST)
      • Norman Walsh - from mail1.ability.netby (8.8.5/8.6.12) with ESMTP id IAA24857Thu, 5 Feb 1998 08:15:22 -0500 (EST)
      • Alexander Taranov - from mail1.ability.netby (8.8.5/8.6.12) with ESMTP id IAA25195Thu, 5 Feb 1998 08:37:52 -0500 (EST)
      • <Possible follow-ups>
      • Marcus Carr - from mail1.ability.netby (8.8.5/8.6.12) with ESMTP id RAA28842Thu, 5 Feb 1998 17:14:28 -0500 (EST) <=