Re: About the source library

Subject: Re: About the source library
From: Joerg Wittenberger <Joerg.Wittenberger@xxxxxxxxx>
Date: 04 May 1999 03:52:55 +0200
>>>>> "ADC" == Adam Di Carlo <adam@xxxxxxxxxxx> writes:

ADC> Joerg Wittenberger <Joerg.Wittenberger@xxxxxxxxx> writes:
>> I *guess* there is some flaw in jade making it so slow.  At least
>> the backend interface is a design flaw I can hardly live with.

ADC> I'm actually not sure whether it's jade, or an implementational
ADC> flaw in the DSSSL spec.

I'm not sure either.  IMHO a straight forward imlementation of DSSSL
*probably* leads to a complexity > O(n) (several tree traversals), but
I'm not sure that it has must do so.  It's definately easy to write
complex, hence slow, DSSSL programs without taking notice.

ADC> BTW, I've used Joerg's sdc -- it's very good and fast, although
ADC> since it doesn't understand DSSSL, any additional DTD style
ADC> support has to be written from scratch.

That's a bit missleading - and I think Adam got that impression,
because I did a bad (no) job at documenting the internal design.

Sdc, while started as a practical formater, was a prototype of a
transformation engine.  I tried to get along without the distinction
of a style and a transformation engine.  Instead I envisoned a "pipe"
of transformations.  As I needed a working program, it came quite far.

I think this design would still be good to explain/discuss here,
because it's might have a bit of proposal value for whatever will be
done next:

The idea was to come up with a set of FO's.  Those where supposed to
be defined using a DTD.  Supporting a certain DTD on input would mean
writing a transformation from that DTD to the FO-DTD (maybe via
multiple, cascaded transformations).

The FO-DTD, which got never written, has approximately what "basic"
LaTeX (plus HTML minus plain TeX) has to offer (paragraphs, inline
FO's, graphics, lists, xref/links).  Those FO's are implemented to a
feasable extent on a variety of backends (ranging from GNU info and
nroff -mm to LaTeX, Lout and HMTL).

A single transformation - in contrast to DSSSL - was unable to express
abitrary, multiple traversals of the tree.  Those would have to be
filtered out (say when analysing the DSSSL program) and could be
placed into either a prior pipe stage or in the worst case folded into 
a second, full traversal.

SUBDOC support, pretty expensive with jade, came in quite naturaly and 
very cheap as one such pipe stage.

There are only a few such transformation yet written.  Mostly a DTD of
mine, which was designed after QUERTZ omitting everything which looks
vagualy like style (<toc> etc.) saveing me any typing effort possible,
some special additions designed to prepare classes (overhead slides
mixed with hand outs and teacher annotations, filtering at the
transformation level, not to clutter up my writings with "<![
%HandOut; [ ...]]>"), a letter DTD, man page and linuxdoc.

In addition there is a mostly unique feature: sensible NDATA handling.
Any NDATA entity, even CDATA element content, is transparently
converted into something sensible for the output.  (That way u can
have some pic or Lout pictures within your GNU/info document, which
are embedded as GIF in HTML).

The sorry detail (and here is Adam right): a) the separation of style
transformation from FO implementation is not yet completed b) there is
no style language reader yet.  Instead I handcoded the data
structures the style definition reader (DSSSL or whatever) should
produce.  Hence I must regard that (Scheme code) the style definition
- I'm very sorry about that.

ADC> capability to it (and maybe DSSSL later).  I don't know if a
ADC> grove-based engine would make all the performance gains, and code
ADC> simplicity disappear.  It's definately a 'community' project
ADC> rather than an 'individual' project.

That was actually the thing I did not manage to find yet.  It looks to 
me as if one has to be *very* careful about the representation issues
of grove nodes (especially separating access to nodes, which are
closer to the root).  Another yet to be investigated problem is the
one node per character illusion.


I'd like to propose that we design/document the following interfaces:

Backend Interface

Flow Object tree DTD (FO-DTD).  The backend reads a well formed XML
document (without? DTD for preformance) to produce the desired
formated object.  I'd like to see this working with a pipe interface,
but others might be more appropriate.

Front End Parser

The front end parser composes the document from storage objects and
delivers something easy to parse or any stream interface to be agreed
upon.  A second SAX style event API might be worth, but is certainly
not as efficient for transformers as it is for interactive programs.

There shall be a variety of parsers ranging from full SGML compliant
to well formed, none validating XML.  Also "basic" LaTeX might be
funny (useful) and database queries come to mind.

Style Definition Reader Interface

The style definition reader delivers a set of transformation
specifications.  I think this is the part which needs most of the
discussion.  The representation must at least allow to reorder
transformations (and maybe contain uninterpreted data).  Access to
upper nodes and any other way to create self references within a node
should be easy to spot.

There shall be a variety of style langs possible.  Sure DSSSL and XSL.

I don't dare yet to propose a cetain interface/representation.  Maybe
closer investigation reveals that we could even stick with slighly
extended XSL here (I could imagine, but I don't think so).

Style Extension Interface

The "uninterpreted data" (above), if at all present, get's invoked on
a pluggable interpreter with some access to the already styled FOT.
Or whatever.  I did not think about that yet.  I just wanted to open
the door to non functional aproaches.  But that might not be desirable 
at all.

NDATA Handler Interface

For each NDATA definition there shall be a defined way to transform
the correspondenting entity into a FO.  This step must allow external
programs to be started, which are told about the storage entity and
the desired output format.  Those programs are not allowed to have
side effects visible to the transformation/formating process.


 DSSSList info and archive:

Current Thread