Re: HTML to DocBook translation

Subject: Re: HTML to DocBook translation
From: Norman Walsh <norm@xxxxxxxxxxxxx>
Date: Thu, 5 Feb 1998 08:08:32 -0500
> (default 
>   (let* ((old-gi (gi (current-node)))
> 	 (new-gi
> 	  (case old-gi
> 	    (("H1") "sect2")
> 	    (("H2") "sect3")
> 	    (("H3") "sect4")
> 	    (("H4") "sect5")
> 	    (("LI") "item")

You'll probably want this to be "listitem".

> 	    (("UL") "itemizedlist")
> 	    (("I")  "emphasize")

You'll probably want this to be "emphasis"

> 	    (("TT") "command")
> 	    (("P") "para")
> 	    (else old-gi))))
>     (make element
> 	  gi: new-gi
> 	  attributes: (copy-attributes))))
[...]
> But how can I translate the <a>-tag?
> 
> It is a label and a reference:
> 
> <a name="label">labelpos</a>
> 
> and
> 
> <a href="label">see labelpos</a>
> 
> Any ideas?

I'd bet that the default rule isn't going to be enough to do the whole
job.  Instead, write specific rules for some elements.

(element A
  (let ((attr (list
		(if (attribute-string "NAME")
		    (list "ID" (attribute-string "NAME"))
		    '())
		(if (attribute-string "HREF")
		    (list "ULINK" (attribute-string "HREF"))
		    '()))))
    (make element gi: "A"
	attributes: attr
	(process-children))))

The attr construction is off the top of my head and may not be
quite right, but that's the idea.

I think you'll also want a rule for LI that inserts at least a containing
para, unless your HTML source has <UL><LI><P>...</LI> tags already:

(element LI
  (make element gi: "LISTITEM"
	(make element gi: "PARA"
		(process-children))))

You could improve it further by doing a little look-ahead to see what
was needed after the LISTITEM tag.

> Is there a short reference for the DocBook DTD on 2 (or so) pages
> available?

Depends what you want.  Here's a collection of RefPurposes from my
DocBook book (not yet finished).  I'm not sure this fits on two pages,
but I don't think I can get any closer ;-)

abbrev         An abbreviation, especially one followed by a period
abstract       A summary
accel          A GUI keyboard shortcut
ackno          Acknowledgements in an Article
acronym        A contraction of initials, frequently pronouncable
action         A response to a user event
address        A real-world address, generally a postal address
affiliation    An author's institutional affiliation
alt            Text representation for a graphical element
anchor         A spot in the document
appendix       An appendix in a book
application    The name of a software program
area           A region defined in a graphic or code example for a Callout
areaset        A set of related areas in a graphic or code example
areaspec       A collection of regions in a graphic or code example
arg            An argument in a CmdSynopsis
artheader      Metainformation for an Article
article        An article
artpagenums    The page numbers of an article as published
attribution    An attribution
author         The name of an author
authorblurb    A short description of an author
authorgroup    Wrapper for author information when a document has multiple authors
authorinitials The initials or other short identifier for an author
beginpage      The location of a page break in a print version of the document
bibliodiv      A section of a Bibliography
biblioentry    An entry in a Bibliography
bibliography   A bibliography
bibliomisc     Untyped bibliographic information
bibliomixed    An entry in a bibliography
bibliomset     A container for related bibliographic information
biblioset      A container for related bibliographic information
blockquote     A quotation set off from the main text
book           A book
bookbiblio     Information about a book used in a bibliographical citation
bookinfo       Metainformation for a Book
bridgehead     A free-floating heading
callout        A &ldquo;called out&rdquo; description of a marked Area
calloutlist    A list of Callouts
caution        A note of caution
chapter        A chapter, as of a book
citation       An inline bibliographic reference to another published work
citerefentry   A citation to a reference page
citetitle      The title of a cited work
city           The name of a city in an address
classname      The name of a class, in the object-oriented programming sense
cmdsynopsis    A synopsis for a command
co             The location of a callout embedded in text
collab         A group of collaborators
collabname     The names of a group of collaborators
colspec        Specifications for a column in a table
command        The name of an executable program or other command
comment        A comment intended for presentation in a draft manuscript
computeroutput Data, generally text, displayed or presented by a computer
confdates      The dates of a conference for which a document was written
confgroup      A wrapper for document meta-data about a conference
confnum        An identifier, frequently numerical, associated with a conference for which a document was written
confsponsor    The sponsor of a conference for which a document was written
conftitle      The title of a conference for which a document was written
contractnum    The contract number of a document
contractsponsor The sponsor of a contract
contrib        The contributions made to a document
copyright      Copyright information about a document
corpauthor     A corporate author
corpname       The name of a corporation
country        The name of a country
database       The name of a database, or part of a database
date           The date of publication or revision of a document
dedication     A wrapper for the dedication page or pages of a book
docinfo        Meta-data for a book component
edition        The name or number of an edition of a document
editor         The name of the editor of a document
email          An email address
emphasis       Emphasised text
entry          A cell in a table
entrytbl       A subtable appearing in place of an Entry in a table
envar          An environment variable
epigraph       A short introduction, typically a quotation, at the beginning of a document
equation       A displayed mathematical equation
errorcode      An error code
errorname      An error message
errortype      The classification of an error message
example        A formal example, with a caption
fax            A fax number
figure         A formal figure, generally an illustration, with a caption
filename       The name of a file
firstname      The first name of a person
firstterm      The first occurrence of a term
footnote       A footnote
footnoteref    A cross reference to a footnote (a footnote mark)
foreignphrase  A word or phrase in a language other than the primary language of the document
formalpara     A paragraph with a title
funcdef        A function (subroutine) name and its return type
funcparams     Parameters for a function referenced through a function pointer in a synopsis
funcprototype  The prototype of a function
funcsynopsis   The synopsis of a function definition
funcsynopsisinfo Information supplementing the FuncDefs of a FuncSynopsis
function       The name of a function or subroutine, as in a programming language
glossary       A glossary
glossdef       A definition in a GlossEntry
glossdiv       A division in a Glossary
glossentry     An entry in a Glossary or GlossList
glosslist      A wrapper for a set of GlossEntrys
glosssee       A cross-reference from one GlossEntry to another
glossseealso   A cross-reference from one GlossEntry to another
glossterm      A glossary term
graphic        A displayed graphical object (not an inline)
graphicco      A graphic that contains callout areas
group          A group of elements in a CmdSynopsis
guibutton      The text on a button in a graphical user interface
guiicon        Graphic and/or text appearing as a icon in a graphical user interface
guilabel       The text of a label in a graphical user interface
guimenu        The name of a menu in a graphical user interface
guimenuitem    The name of a terminal menu item in a graphical user interface
guisubmenu     The name of a submenu in a graphical user interface
hardware       A physical part of a computer system
highlights     A summary of the main points disussed in a book component (chapter, section, etc.)
holder         The name of the individual or organization that holds a copyright
honorific      The title of a person
important      An admonition set off from the text
index          An index
indexdiv       A division in an index
indexentry     An entry in an index
indexterm      A wrapper for terms to be indexed
informalequation A displayed mathematical equation without a title
informalexample An untitled example
informaltable  An untitled table
inlineequation An untitled mathematical equation or expression occuring inline
inlinegraphic  An object containing or pointing to graphical data to be rendered inline
interface      An element of a graphical user interface
interfacedefinition The name of a formal specification of a graphical user interface
invpartnumber  An inventory part number
isbn           The International Standard Book Number of a document
issn           The International Standard Serial Number of a journal
issuenum       The number of an issue of a journal
itemizedlist   A list in which each entry is marked with a bullet or other dingbat
itermset       A set of index terms in the metainformation of a document
jobtitle       The title of an individual in an organization
keycap         The text printed on a key on a keyboard
keycode        The internal, frequently numeric, identifier for a key on a keyboard
keycombo       A combination of input actions
keysym         The symbolic name of a key on a keyboard
keyword        One of a set of keywords describing the content of a document
keywordset     A set of keywords describing the content of a document
legalnotice    A statement of legal obligations or requirements
lineage        The portion of a person's name indicating a relationship to ancestors
lineannotation A comment on a line in a verbatim listing
link           A hypertext link
listitem       A wrapper for the elements of a list item
literal        Inline text that is some literal value
literallayout  A wrapper for lines of text in which line breaks and white space are to be reproduced faithfully
lot            A list of the titles of formal objects (as tables or figures) in a document
lotentry       An entry in a list of titles
manvolnum      The volume number of the section of a complete set of UNIX reference pages to which a reference entry belongs
markup         A string of formatting markup in text which is to be represented literally
medialabel     A name which identifies the physical medium on which some information resides
member         An element of a simple list
menuchoice     A selection or series of selections off a menu
modespec       Application-specific information necessary for the completion of an OLink
mousebutton    The conventional name of a mouse button
msg            A message in a message set
msgaud         The audience to which a message in a message set is relevant
msgentry       A wrapper for an entry in a message set
msgexplan      Explanatory material relating to a message in a message set
msginfo        Information about a message in a message set
msglevel       The level of importance or severity of a message in a message set
msgmain        The primary component of a message in a message set 
msgorig        The origin of a message in a message set
msgrel         A related component of a message in a message set
msgset         A detailed set of messages, usually error messages, containing both the messages and additional information about the conditions under which those messages occur
msgsub         A subcomponent of a message in a message set
msgtext        The actual text of a component of a message in a message set
note           A message set off from the text
olink          A link that addresses its target indirectly, through an entity
option         An option for a command
optional       Optional information contained in a synopsis
orderedlist    A list in which each entry is marked with a sequentially incremented label
orgdiv         A division of an organization
orgname        The name of an organization other than a corporation 
otheraddr      Uncategorized information in address
othercredit    A person or entity other than an author or editor that is to be credited in a document
othername      A component of a persons name that is not a first name, surname, or lineage
pagenums       The numbers of the pages in a book, for use in a bibliographic entry
para           A paragraph
paramdef       Data type information and the name of the parameter this information  applies to in a function prototype or synopsis
parameter      A value or a symbolic reference to a value
part           A high-level sectioning element in a book
partintro      An introduction to the contents of a part
phone          A telephone number
phrase         A span of text
pob            A post office box in an address
postcode       A postal code in an address
preface        Introductory matter preceding the first chapter of a book
primary        The primary word or phrase under which an index term should be sorted
primaryie      A primary term in an index entry, not in the text
printhistory   The printing history of a document
procedure      A list of operations to be performed in a well defined sequence
productname    The formal name of a product
productnumber  A number assigned to a product
programlisting A literal listing of all or part of a program
programlistingco A program listing with associated areas used in callouts
prompt         Character or string indicating the start of an input field in a  computer display
property       A unit of data associated with some part of a computer system
pubdate        The date of publication of a document
publisher      The publisher of a document
publishername  The name of the publisher of a document
pubsnumber     A number assigned to a publication other than an ISBN or ISSN or inventory part number
quote          An inline quotation
refclass       The scope or other indication of applicability of a reference entry
refdescriptor  A substitute for the names in a reference entry when the entry in  question covers more than one topic and none of the RefNames is to be used as the name for identifying the reference entry as a whole
refentry       A reference page (originally a UNIX man-style reference page)
refentrytitle  The title of a reference page
reference      A collection of reference entries
refmeta        Metainformation for a reference entry
refmiscinfo    Metainformation for a reference entry other than the title and volume number
refname        The name of (one of) the subject(s) of a reference page
refnamediv     Naming, purpose, and classification information for a reference page
refpurpose     The purpose of the subject of a reference page
refsect1       A major subsection of a reference entry
refsect1info   Metainformation for a RefSect1
refsect2       A subsection of a RefSect1
refsect2info   Metainformation for a RefSect2
refsect3       A subsection of a RefSect2
refsect3info   Metainformation for a RefSect3
refsynopsisdiv A syntactic synopsis of the subject of the reference page
refsynopsisdivinfo Metainformation for a RefSynopsisDiv
releaseinfo    Information about a particular version of a document
replaceable    Content that may or must be replaced in a synopsis or command line
returnvalue    The value returned by a function
revhistory     A history of the revisions to a document
revision       An entry describing a single revision in the history of the revisions to a document
revnumber      The number of a revision to a document
revremark      A description of a revision to a document
row            A row in a table
sbr            An explicit line break in a command synopsis
screen         Text that a user sees or might see on a computer screen
screenco       A screen with associated areas used in callouts
screeninfo     Information about how a screen shot was produced
screenshot     A representation of what the user sees or might see on a computer screen
secondary      A secondary word or phrase in an index term
secondaryie    A secondary term in an index entry, not in the text
sect1          A top-level section of document
sect1info      Metainformation for a Sect1
sect2          A subsection within a Sect1
sect2info      Metainformation for a Sect2
sect3          A subsection within a Sect2
sect3info      Metainformation for a Sect3
sect4          A subsection within a Sect3
sect4info      Metainformation for a Sect4
sect5          A subsection within a Sect4
sect5info      Metainformation for a Sect5
see            Part of an index term directing the reader instead to another entry in the index
seealso        Part of an index term directing the reader also to another entry in the index
seealsoie      'A "See also" entry in an index, not in the text'>
seeie          'A "See" entry in an index, not in the text'>
seg            An element of a list item in a segmented list
seglistitem    A list item in a segmented list
segmentedlist  A segmented list, a list of sets of elements
segtitle       The title of an element of a list item in a segmented list
seriesinfo     Information about the publication series of which a book is a part
seriesvolnums  Numbers of all the volumes in a series, for use in SeriesInfo
set            Two or more books
setindex       An index to a set of books
setinfo        Metainformation for a Set
sgmltag        A component of SGML markup
shortaffil     A brief description of an affiliation
shortcut       A key combination for an action that is usually also accessible through menus or other means
sidebar        A component of a document, often presented in a box, that is isolated from the narrative flow of the main text
simpara        A paragraph that contains only text and inline markup, no block elements
simplelist     An undecorated list of single words or short phrases
simplesect     A section of a document with no subdivisions
spanspec       Formatting information for a spanned column in a table
state          A state or province in an address
step           A unit of action in a procedure
street         A street address in an address
structfield    A field in a structure (in the programming language sense)
structname     The name of a structure (in the programming language sense)
subject        One of a group of terms describing the subject matter of a document
subjectset     A set of terms describing the subject matter of a document
subjectterm    A term in a group of terms describing the subject matter of a document
subscript      A subscript (as in H-2-O, the molecular formula for water)
substeps       A wrapper for steps that occur within steps in a procedure
subtitle       The subtitle of a document
superscript    A superscript (as in x-squared, the mathematical notation for x times itself)
surname        'A family name, in western cultures the "last name"'>
symbol         A name that is replaced by a value before processing
synopfragment  A fragment in a synopsis
synopfragmentref A reference to a fragment of a command synopsis
synopsis       Syntax of a command or function
systemitem     System-related term or item
table          A formal table in a document
tbody          A wrapper for the rows of a table or informal table
term           The word or phrase being defined or described in a variable list
tertiary       A tertiary word or phrase in an index term
tertiaryie     A tertiary term in an index entry, not in the text
tfoot          A table footer consisting of one or more rows
tgroup         A wrapper for the main content of a table, or part of a table
thead          A table header consisting of one or more rows
tip            A suggestion to the user, set off from the text
title          The text of the title of a section of a document or of a formal block-level element
titleabbrev    An abbreviated title
toc            A table of contents
tocback        An entry in a table of contents for a back matter component (Bibliography, Glossary, etc.)
tocchap        An entry in a table of contents for a chapter-like component (Chapter or Appendix)
tocentry       An entry (as for a section) in a table of contents
tocfront       An entry in a table of contents for a front matter component (Dedication, Preface, etc.)
toclevel1      A top-level entry within a table of contents entry for a chapter-like component
toclevel2      A second-level entry within a table of contents entry for a chapter-like component
toclevel3      A third-level entry within a table of contents entry for a chapter-like component
toclevel4      A fourth-level entry within a table of contents entry for a chapter-like component
toclevel5      A fifth-level entry within a table of contents entry for a chapter-like component
tocpart        An entry in a table of contents for a part of a book
token          Unit of information in the context of lexical analysis
trademark      A trademark
type           The classification of a value
ulink          A link that addresses its target by means of a URL, a Uniform Resource Locator
userinput      Data entered by the user
varargs        An empty element in a function synopsis indicating a variable number of arguments
variablelist   A list in which each entry is composed of a set of one or more terms and an associated description
varlistentry   A wrapper for a set of terms and the associated description in a variable list
void           An empty element in a function synopsis indicating that the function in question takes no arguments
volumenum      The volume number of a document in a set (as of books in a set or articles in a journal)
warning        An admonition set off from the text
wordasword     A word meant specifically as a word and not representing anything else
xref           A cross reference to another part of the document
year           The year of publication of a document




 DSSSList info and archive:  http://www.mulberrytech.com/dsssl/dssslist


Current Thread
  • HTML to DocBook translation
    • Christian Leutloff - from mail1.ability.netby web4.ability.net (8.8.5/8.6.12) with ESMTP id PAA12945Wed, 4 Feb 1998 15:01:20 -0500 (EST)
      • Thomas G. Lockhart - from mail1.ability.netby web4.ability.net (8.8.5/8.6.12) with ESMTP id VAA15068Wed, 4 Feb 1998 21:33:51 -0500 (EST)
      • Norman Walsh - from mail1.ability.netby web4.ability.net (8.8.5/8.6.12) with ESMTP id IAA24857Thu, 5 Feb 1998 08:15:22 -0500 (EST) <=
      • Alexander Taranov - from mail1.ability.netby web4.ability.net (8.8.5/8.6.12) with ESMTP id IAA25195Thu, 5 Feb 1998 08:37:52 -0500 (EST)
      • <Possible follow-ups>
      • Marcus Carr - from mail1.ability.netby web4.ability.net (8.8.5/8.6.12) with ESMTP id RAA28842Thu, 5 Feb 1998 17:14:28 -0500 (EST)