Re: (dsssl) Re: The Future of DSSSL

Subject: Re: (dsssl) Re: The Future of DSSSL
From: Brandon Ibach <bibach@xxxxxxxxxxxxxx>
Date: Tue, 1 Jan 2002 23:03:08 -0600
Quoting Trent Shipley <tcshipley@xxxxxxxxxxxxx>:
> I would be willing to take on at least part of the task described by B.I. 
> (below) since I don't mind doing documentation as long as the coders are 
> cooperative and actually enjoy the analysis part of programming.
> 
   Great!  As I noted, a read-through of the "Jade Internals" document
would probably be the best place to start, though it appears to be
unavailable, currently.  I'm trying to track it down.
   Following that, I'd look into tools which can automatically
document the class relationships in the software.  I haven't had a
chance to do this, and don't really know anything about it.  If you
want to research this, please let me know what you find.

> I have read the DSSSL spec. once and am printing out HyTime 1997, 2nd ed. as 
> I write this.  Presumably there is little point to starting the source code 
> until the underlying problem is understood.
> 
   Yes, a good understanding of DSSSL will be important.  HyTime is
involved, but not essential.

> <opinion>
> A standard network representation of an SGML resource is a Grove.  This 
> implies a modular application below the level of OJ that parses and presents 
> the Grovered data.  (Is FOT the serial form of this representation?)
> 
   A Grove is a "directed graph", capable of representing all sorts of
structured information, not just SGML.  A Grove is constructed
according to a "Grove plan", which is a set of classes, properties and
data types.  Each node in the Grove has a class.  The class determines
what properties the node has.  Each property has a data type.  The
property can be non-nodal, meaning it is something like a string or an
integer, or it can be nodal, meaning it refers to another node.  Nodal
properties form the "graph" of the nodes.  Check out chapter 9 of the
DSSSL spec for more details on all of this.
   A Flow Object Tree (FOT) really has nothing to do with Groves.  An
FOT is just a generic description of a formatted document.  The
various Jade backends (TeX, RTF, MIF, HTML, etc) take an FOT and
produce a formatted document.  One of the assertions in my original
message was that an FOT could be represented by an appropriate Grove
plan, thus making the DSSSL formatting process simply a specific type
of transformation (from an SGML Grove into an FOT Grove).

> It is unfortunate that Node Regular Expresions (grove queries), SGML 
> Transforms and Formatting all wound up in the same DSSSL specification.
> 
   It's not so bad, really, but it's very perceptive of you to see
that they don't *have* to be connected to one another.  The W3C's
versions of these are separate standards (XPath, XSLT and XSLFO,
respectively).

> 1) An SGML parser
> 
   This would be SP/OpenSP.  It's just fine as it is for 99.9% of SGML
applications today, in my opinion.

> 2) An SGML-to-Grove translator with at least two backends
> 2.1) A C based data-object (structure)
> 2.2) FOT serialized output
> 
   This would be SPGrove, which is part of Jade/OpenJade.  It also
works fairly well, although there are large parts of the SGML Grove
Plan (the SGML Property Set, to be more accurate) that it does not
support.  But, it does support the most commonly-needed parts.
Expanding this support would allow for a variety of applications that
are not currently possible.

> 3) Grove queries (node regular expressions.  This should be extended to 
> include some sort of PERLescent stream-based regular expressions despite the 
> fact that the leaf nodes of a grove and characters for normal regexes are the 
> same.)
> 
   Node regular expressions != Grove queries.  DSSSL specifies an API
for obtaining information from Groves.  It is called SDQL and is
documented in chapter 10 of the DSSSL spec.  Node regular expressions
are a rather odd beast that have (so far as I'm aware) never been
implemented, and are understood by very few people.  In fact, after
several readings of that section of the spec, I still don't quite
understand what's going on with them.
   As far as a more user-friendly Grove querying mechanism, I suggest
something like XPath, with an appropriate mapping to the Grove model.
Such a syntax could be compiled into SDQL forms for execution.  In
fact, I have some code from several years ago, written by Elliot
Kimber, that provides support for XPath/XPointer in DSSSL.

> 4) Grove Transforms 
> 
   That'd be the DSSSL Transformation Language, of course.  Again,
this hasn't really seen the light of day in implementation.  It's been
done, but not much used, so far as I'm aware.  The TL is not easy to
get started with, mainly due, I believe, to lack of educational
materials.  But it is very powerful.  I hope we can change this with a
mainstream implementation and some practical examples of why people
would want to use it.

> 5) Formatting
> 
> Furthermore, the new and improved OJ's modular structure implies that it's 
> formatting engine does not in fact provide TeX, PDF, or any other direct 
> output.  Instead the client application calls the root formatting object and 
> depending on whether it called the C++ code or the extern C wrapper for the 
> C++ code the client app gets back a Formatted Object Descriptor.  The client 
> application then does whatever it does with the Formatted Object Descriptor.
> 
   It already works this way, mostly.  The various backends all
interface to Jade via a common interface (more specifically, they all
inherit from a common class) through which Jade provides access to the
FOT.  It's the backend's job to do something useful with the FOT,
which can be as simple as serializing it to an XML representation (as
the "FOT" backend does).

> I would like to see Grove queries, Grove transforms, and Grove based 
> formatting provided as separate C++ coded utilitities.  Ideally there will 
> also be an interface for plain C through C++'s 'extern' keyword.
> 
   If you mean that the various components of DSSSL should be
available to be embedded into other applications such that they could
be used (relatively) independently of each other, I agree. :)

> It is unfortunate that DSSSL is an implementation instead of just a 
> functional specification.  In particular, the requirement to use the 
> unpopular LISP.Scheme language family was a marketing disaster.
> 
   Speak for yourself! :P  I think Scheme is an elegant, efficient
language.  It was chosen very specifically because it is a functional,
side-effect free language, suited for the task.
   However, I can acknowledge that Scheme can take a little getting
used to if you've never seen it before.  But, then, what programming
language doesn't take some work to get fluent in?  I've written a
significant amount of code in Perl (some years ago, granted), and I
still get baffled half the time when I look at some Perl code.

> There is no reason that DSSSL's functionality couldn't have been implemented 
> in ANSI BASIC.   For any number of good business reasons why at least one 
> flavor of DSSSL should have been implemented in BASIC.
> 
   Can't say I'm a big BASIC fan, but when push comes to shove, if
it's important enough, I see no reason why DSSSL couldn't be fitted
into any functional language that can guarantee side-effect free
behavior.  (Yes... the side-effect free thing *is* important.)

> There is no *necessary* relation between the DSSSL idea and Scheme.
> The relation specified in the ISO spec should be broken.
> 
   Strong words.  *Very* arguable...

> It is "easy" to go from queries to transforms to formatting by building each 
> stage on the basis of code from the prior project stage.
> 
   Yes.  The DSSSL spec defines four "languages": the Expression
Language (aka Scheme) in Chapter 8, the Query Language in Chapter 10,
the Transformation Language in Chapter 11 and the Style Language in
Chapter 12.  The QL depends on the EL, the TL depends on the EL and
QL, and the SL depends on the EL and QL, also.  I'm of the opinion
that the SL could have been defined in terms of the TL, much as
formatting with XSLFO relies on XSLT.

> Though it is possible to use the formatting language to produce general 
> transforms it would be silly to do so.
> 
   Heh... wanna bet? ;)  How do you think this is being done right
now?  HTML is SGML, so "formatting" a Docbook (also SGML, of course)
document to HTML is really an SGML-to-SGML transform.  Jade only
implements the Style Language, so that is what is being used to
accomplish these transformations, via the SGML backend (using
non-standard flow objects).

> It is not silly to use transforms to produce Formatted Objects (though it 
> might be too costly to do so in terms of performance).  It will probably be 
> almost convenient to write formatting clients using the Transform C++ API.
> 
   I'm not sure exactly what you're suggesting here.  What do you mean
by "Formatted Objects"?  FOTs?  TeX?  PDF?

-Brandon :)

 DSSSList info and archive:  http://www.mulberrytech.com/dsssl/dssslist

Current Thread