converting SGML DTDs was Re: DD: DTD for authoring.

Subject: converting SGML DTDs was Re: DD: DTD for authoring.
From: Christian Leutloff <leutloff@xxxxxxxxxxxxxxxxx>
Date: 26 Jan 1998 22:19:55 +0100
Mbox-line: From leutloff@sundancer.oche.de Mon Jan 26 22:19:57 1998
Tony Graham <tgraham@xxxxxxxxxxxxxxxx> writes:

> We chose DocBook several months ago, and the Procedure Library and the
> Glossary are already in DocBook.  It is my intention that the Cookbook
> items will be converted to DocBook, but I haven't got to it yet.  (Are
> there any takers for a comparatively simple conversion job?)

Isn't it possible to this job automagically!?

jade is aimed to have a sgml backend that can be used for the
conversion. But I haven't used jade for this purpose so far.
Here's the transform.html from my Debian jade package:

<---------------- schnipp
Using Jade for SGML transformations

Jade does not support the DSSSL Transformation Language. However, it
provides some simple, non-standardized extensions to the DSSSL Style
Language that allow it to be used for SGML transformations.

These extensions are used in conjunction with the SGML backend which is
selected with the -t sgml or -t xml options. Unlike other backends, the SGML
backend writes its output to the standard output.

The -t xml option makes empty elements and processing instructions use the
XML syntax. Note that the XML declaration is not automatically emitted.

The extensions consist of a collection of flow object classes that are used
instead of the standard DSSSL-defined flow object classes:

element
empty-element
     Each of these flow objects results in an element in the output. The
     element flow object is a compound flow object (one that can have child
     flow objects). Both a start-tag and an end-tag are generated for this
     flow object. The empty-element is an atomic flow object (one that
     cannot have child flow objects). Only a start-tag is output for this.
     It should should be used for elements with a declared content of EMPTY
     or with a content reference attribute. Both of these flow objects
     support the following non-inherited characteristics:
     gi
          This is a string-valued characteristic that specifies the
          element's generic identifier. It defaults to the generic
          identifier of the current node.
     attributes
          This specifies the element's attributes as a list of lists each of
          which consists of exactly two strings, the first specifying the
          attribute name and the second the attribute value. It defaults to
          the empty list.
processing-instruction
     This is an atomic flow object that results in a processing instruction.
     It supports the following non-inherited characteristics:
     data
          This is a string-valued characteristic that specifies the content
          of the processing instruction. It defaults to the empty string.
document-type
     This is an atomic flow object that results in a DOCTYPE declaration. It
     supports the following non-inherited characteristics:
     name
          This is a string-valued characteristic that specifies the name of
          the document type (which must be the same as the name of the
          document element). It must not be omitted.
     system-id
          This is a string-valued characteristic that specifies the system
          identifier of the document type. If non-empty, this will be used
          as the system identifier in the doctype declaration. The default
          value is the empty string.
     public-id
          This is a string-valued characteristic that specifies the public
          identifier of the document type. If non-empty, this will be used
          as the public identifier in the doctype declaration. The default
          value is the empty string.
entity
     This is an compound flow object that stores its content in a separate
     entity. It supports the following non-inherited characteristic:
     system-id
          The system identifier of the entity. For now this is treated as a
          filename not as an FSI.

     Note that no entity reference or declaration is emitted.
entity-ref
     This is an atomic flow object that results in an entity reference. It
     supports the following non-inherited characteristic:
     name
          The name of the entity.
formatting-instruction
     This is an atomic flow object that inserts characters into the output
     without change. It supports the following non-inherited characteristic:
     data
          This is the string to be inserted.

     It differs from normal data characters in the &, < and > will not be
     escaped.

There is also the following characteristic:

preserve-sdata?
     This is an inherited boolean characteristic that applies to character
     flow objects. When true, if the current-node for the character flow
     object was an sdata node, then the character will be output as a
     reference to an entity with the same name. The initial value is #f.

Each of these flow object classes must be declared using
declare-flow-object-class in any DSSSL specification that makes use of it. A
suitable set of declarations is:

(declare-flow-object-class element
  "UNREGISTERED::James Clark//Flow Object Class::element")
(declare-flow-object-class empty-element
  "UNREGISTERED::James Clark//Flow Object Class::empty-element")
(declare-flow-object-class document-type
  "UNREGISTERED::James Clark//Flow Object Class::document-type")
(declare-flow-object-class processing-instruction
  "UNREGISTERED::James Clark//Flow Object Class::processing-instruction")
(declare-flow-object-class entity
  "UNREGISTERED::James Clark//Flow Object Class::entity")
(declare-flow-object-class entity-ref
  "UNREGISTERED::James Clark//Flow Object Class::entity-ref")
(declare-flow-object-class formatting-instruction
  "UNREGISTERED::James Clark//Flow Object Class::formatting-instruction")
(declare-characteristic preserve-sdata?
  "UNREGISTERED::James Clark//Characteristic::preserve-sdata?"
  #f)

Here's a simple example that does the identity transformation:

<!doctype style-sheet PUBLIC "-//James Clark//DTD DSSSL Style Sheet//EN">

(declare-flow-object-class element
  "UNREGISTERED::James Clark//Flow Object Class::element")

(define (copy-attributes #!optional (nd (current-node)))
  (let loop ((atts (named-node-list-names (attributes nd))))
    (if (null? atts)
        '()
        (let* ((name (car atts))
               (value (attribute-string name nd)))
          (if value
              (cons (list name value)
                    (loop (cdr atts)))
              (loop (cdr atts)))))))

(default (make element
               attributes: (copy-attributes)))

Note that this does not deal with empty elements nor processing
instructions, nor does it include a doctype declaration.

James Clark

<---------------- schnapp

Bye
  Christian

-- 
Christian Leutloff, Aachen, Germany         leutloff@xxxxxxxxxxxxxxxxx  
      http://www.oche.de/~leutloff/         leutloff@xxxxxxxxxx      

            Debian GNU/Linux - http://www.de.debian.org/

Attachment: pgp00004.pgp
Description: PGP signature

Current Thread
  • DD: DTD for authoring.
    • Ben Trafford - from mail1.ability.netby web4.ability.net (8.8.5/8.6.12) with ESMTP id NAA26085Wed, 21 Jan 1998 13:23:09 -0500 (EST)
      • Paul Prescod - from mail1.ability.netby web4.ability.net (8.8.5/8.6.12) with ESMTP id NAA26299Wed, 21 Jan 1998 13:35:18 -0500 (EST)
        • Tony Graham - from mail1.ability.netby web4.ability.net (8.8.5/8.6.12) with ESMTP id NAA26787Wed, 21 Jan 1998 13:59:21 -0500 (EST)
          • Christian Leutloff - from mail1.ability.netby web4.ability.net (8.8.5/8.6.12) with ESMTP id WAA21076Mon, 26 Jan 1998 22:33:07 -0500 (EST) <=
          • Paul Prescod - from mail1.ability.netby web4.ability.net (8.8.5/8.6.12) with ESMTP id TAA10697Tue, 27 Jan 1998 19:42:42 -0500 (EST)
          • Sebastian Rahtz - from mail1.ability.netby web4.ability.net (8.8.5/8.6.12) with ESMTP id EAA19360Fri, 30 Jan 1998 04:22:36 -0500 (EST)
          • Thomas G. Lockhart - from mail1.ability.netby web4.ability.net (8.8.5/8.6.12) with ESMTP id KAA21988Fri, 30 Jan 1998 10:01:49 -0500 (EST)
          • Sebastian Rahtz - from mail1.ability.netby web4.ability.net (8.8.5/8.6.12) with ESMTP id LAA22492Fri, 30 Jan 1998 11:06:04 -0500 (EST)