(dsssl) Re: [OpenJade-devel] Re: OpenJade development

Subject: (dsssl) Re: [OpenJade-devel] Re: OpenJade development
From: "Paul Tyson" <paul@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 4 Oct 2001 14:28:30 -0700

I am trying to contain my excitement at the possibility of further
development on openjade, leading to a more complete implementation of the
DSSSL standard.  There has been occasional talk of it in the last few years.
But it seems there are less than 20 people in the world who: a) are
concerned about the subject; and b) understand what DSSSL can do.  And of
those people, none are able to commit sustained effort to create code.

I am cross-posting this to the dsssl-list, because I think the topics you
discuss are of interest to DSSSL users as well as potential developers.  My
apologies for the length, but I wanted to include full quotes of your and
Adam's comments for the benefit of dsssl-list readers.

See my specific comments and responses to your many thoughtful observations

[jf] = Javier Farreres
[ad] = Adam Di Carlo (2-level quoted text)
[pt] = Paul Tyson

> Well, when I first decided to contact someone from OpenJade, it was just
to get
> information.
> Now that I have two answers, it comes my time to ask questions to the list
> solve some doubts I have.
> The fact is, before I involve any student in this, I want to have
> clear.
> First I contacted Adan di Carlo as it was the name I got from some place.
> >> The thing is, I am not aware of the level of development of OpenJade.
> >> far as I know, it doesn't have a transformation language developed,
> >
> >It does have a proprietary transformation language.  It doesn't have
> >ISO DSSSL standard transformation, however.  On the other hand, the
> >ISO DSSSL transformation language is considered overcomplicated by
> >many and already obsoleted by XSLT, not to mention that many consider
> >the proprietary varient, implemented by James Clark in the original
> >Jade, much better anyhow.
> Ok. Here comes the first question, which is a bit long.
> First came SGML, then DSSSL and HyTime, later came XML and later XSL.
> XML can be seen as a simplified SGML (it has no marked sections for
> XSLT is a translation stylesheet language based on XML. It is not a
> language. There is no construct (as far as I know) in XSLT to say repeat
> structure n times. Because XSLT is not a programming language. It is good
> lay people to write stylesheets easily, but DSSSL, being a programming
> language, is much more powerfull than XSLT.
> With all this, I mean XSLT  --CANNOT-- substitute DSSSL as a general tool.
> Apart from the fact that the development of XML has ignored groves for
> and has no general model underneath.
> Am I wrong in something?

Only in that you may underestimate the power of XSLT.  It is possible, for
instance, to repeat a structure n times.  For someone who knows functional
programming techniques, XSLT can do some wonderful things, but the syntax is
often awkward and verbose.  I stop short of saying XSLT could ever be a
complete replacement or substitute for DSSSL, partially because the
application domains are different (with a large overlapping area).  I would
estimate that for 90% of *current* publishing applications you *could* use
XML, XSLT and XSL-FO.  But there is a large set of potential applications,
exploiting the power of groves and HyTime, that have not even been
considered in *current* publishing applications, which could not even be
imagined from the XML/XSL paradigm.

You are right on target with your observation that the lack of a
comprehensive data model in XSLT is a big weakness.

> Now to the transformation stuff. The transformation language in DSSSL is
> on groves. And the transformation language in DSSSL transforms groves, not
> documents. XSLT only transforms elements into elements. And well, it is
> that DSSSL transformation is overcomplicated. I think noone has tried to
> it consequently. I am trying it. in a course I am planning on DSSSL.

The DSSSL transformation language is, conceptually, very sophisticated and
challenging.  Nothing less would be so powerful.  Unfortunately, to
understand it fully requires: a) good grasp of functional programming
methods, especially recursion on tree structures; and b) an understanding of
the abstract structure of documents according to the grove model.  Very few
people, apparently, have (or bother to acquire) those prerequisite skills.

I agree that perhaps it is lack of good teaching that has contributed to its
disregard.  Also lack of immediate practical applications other than
creating SGML documents.  James Clark himself asked (perhaps cynically) what
the transformation language would do that jade's sgml output method couldn't
do.  He's right, in the space of applications that only want sgml-to-sgml
transformations.  But the full model is
sgml-to-grove-to-grove(s)-to-whatever, with the TL doing the
grove-to-grove(s) part.  What you do with the output grove is only limited
by your imagination (and programming skill!).  This provides a standard,
universal model for practically all data and document transformation needs,
using a single consistent data model and processing paradigm.

The reality is, 99% of SGML/XML users will (or must) settle for
special-purpose, limited processing solutions because that is what gets
today's jobs done today.

> Apart from teaching at the university as a parttime professor, and doing
> phd, I am also involved in the publishing theme. I wiorked some years in a
> publishing company who developed encyclopaedias. And I would say without
> that SGML, being as complicated as it is, is even too simple for the
> Perhaps HyTime is the sollution, but I know of no implementation of it.

Again, you are right on target here.  I believe HyTime processing could
easily be implemented on top of a complete DSSSL engine.

There are a few HyTime implementations, the most noteworthy being
GroveMinder, now held by Epremis (www.epremis.com).  But as far as I know
this doesn't use or understand DSSSL.  The HyTime users group home page
http://www.hytime.org is old, but has some useful information.  You can also
find a few more HyTime links at the SGML/XML Cover Pages

> Now I offer courses to publishing companies, all related to SGML. One of
> is DSSSL I have not yet finished it, but I think I have found a good way
> teach every part of the standard. And well, when someone understands it,
> transf language in DSSSL is not so complicated.
> My question is: would it be useful to implement this part of DSSSL into
> OpenJade?

I myself would find it useful, and I believe it would have the potential to
support many wonderfully useful applications.

But the market has not demonstrated a need for this sort of thing.  The fact
is, for any *particular* SGML transformation, there are at least half a
dozen "good" ways to accomplish it, and XSLT is quickly becoming about the
easiest way to specify such transformations.  Developers do not hesitate to
string together a long sequence of preprocessing and/or postprocessing
steps, with XML+XSLT in the middle.  Two or three other scripting languages
may be involved in the process, such as ASP, SQL, etc., and neither the
developers nor the system architects are bothered by polyglot solutions.
Companies big and small are offering integrated, but inevitably proprietary
and non-standards-based) solutions to these problems.

A DSSSL+HyTime grove-based transformation system would drastically simplify
these sorts of applications, and raise the capability of "single-source
publishing" to an entirely new level.

> >> not does it have a full implementation of the grove paradigm. It
> >> doesn't have also a full implementation of all the flow objects, not
> >> the full query language.
> >
> >I think there is a partial query implementation in there.
> About it, I think Jade and OpenJade implement only the core query
language, but
> not the full query language. But implementing the full query language has
> sense if the full grove paradigm is implemented.
> Anyway, DSSSL query language is designed to work on the default grove
> There are no query constructs for parts of the grove outside the default
> Would it be interesting to implement the full query language?
Definitely interesting, but as with the TL the market has not demonstrated a
need for this.  XPath implemented the 80% of query that is most useful.  I
admit I don't fully understand node regular expressions, but I believe it
could be put to creative use if it were implemented.

Any industrial-strength DSSSL engine should have a complete query
capability.  A useful extension would be a shorthand notation--perhaps XPath
itself.  I believe this could be implemented as DSSSL procedures.

> >The thing I hear most often complained about is that we only implement
> >a subset of the DSSSL page layout stuff.  So full-on page layout, at
> >least, last I checked, was the most often requested feature.
> >
> >I think this might be difficult to do since you'd have to think about
> >both the DSSSL processor, but much of the work would be in the
> >backends as well, which is probably of less academic interest.  But
> >you asked!
> Well, yes. My oppinion is the same. The fact that the page feature is not
> implemented pretty much leaves Jade as a toy. I the publishiing area where
> I teach, this feature is of uttermost interest. And well, I think that for
> student doing his final project, everything can be formative. Implementing
> part could be very nice, I think.
> Take into account, I have an unlimited source of students. I must not
limit to
> offer one project. I can offer several projects, to implement different
> of OpenJade.
> >Also, see develdoc/TODO in the sources.
> I will, thanks.
> > Dear Javier,
> >
> > a while ago, I did a work on DSSSL myself, devoloping a PDF backend for
> > (Open)Jade in Java. Unfortunately, the (Open)Jade Project seems to be -
> > sorry guys - on a pretty low level, due to the other things most
> > do and the general shift towards XSL after all.
> This is my question. XSL and XML have great impact, as there is internet
> behind. OpenJade will allways be the more complicated tool for
> publishing.
> But I think it is ok this way.
> > While DSSSL (and it's main tool, Jade) is certainly a great concept, it
> > a) quite hard to unterstand and b) even harder to understand the code
> > "normal" people. James Clark has done an awesome job, noone doubts that,
> > the complexity is definitly high.
> With code, you mean the code of the DSSSL language, or the code of the
> implementation?
> Understanding DSSSL is just understanding lisp (ok, scheme), which is not
> difficult.
> And well, after all, Jade and DSSSL are not for "normal people" but for
> professional programmers and publishing technicians.
> And XML, with its simplicity, is no more HTML, and it requieres also some
> information structuring concepts, so it is not that easy neither.
> > From my point of view, a project both interesting as well as good for
> > community would be to implement the full group of the "page-sequence"
> > object - but the main part would be to get the backends render this
> > correctly. As with the separate PDF backend, it would be possible and
> > chalenging, but I doubt it would be possible at all for RTF, plain text
> > Only the TeX-engine could certainly cope with it easily, but getting
> > done is certainly a thing for itself...
> Ok. As far as I see, there is a kind of consensus about the goodness of
> implementing the page feature.
> It was also one of my ideas.

Definitely, high-end publishing requires complex page layout and sequencing.
In my experience, for the formatting needs I have encountered, the style
language is capable of expressing all the requirements.  It remains only to
implement them.  I don't believe getting this information into the flow
object tree is the difficult part--rather, it's in the back-end to actually
do the page layout based on the fot.  (I could be completely wrong about
this, though.)

> Any more comments? What about implementing full groves and a real grove

Full grove implementation is essential for adding any HyTime addressing and
linking capabilities.  It would also allow other possibilities, such as
automatic DTD analysis and transformation.

By allowing additional front-end notation processors to supply input groves,
openjade could become a general-purpose transforming wizard.  For instance,
Rich Text Format documents could be delivered as groves to a DSSSL
transformation.  (GroveMinder has the capability to process other notations
to create groves, but does not use DSSSL for transformation.)

Flexible grove plans would allow creation of "lite" groves that could
enhance performance in some situations.

As I see it, groves are at the center of the division between ISO and W3C.
The conception of groves is what finally led to the breakthrough that
unified DSSSL and HyTime.  The failure to appreciate and understand groves
is what led W3C to create fragmented, inconsistent standards and
special-purpose syntax for local, limited-scope applications of DSSSL and
HyTime concepts.

So, for better or for worse, any implementation of DSSSL or HyTime must have
full grove-processing capabilities.

Other comments:
1. ISO standards are reviewed every 5 years for renewal, revision, or
cancellation.  With the lack of activity on DSSSL implementations, I am
concerned that it could lose its status as an active ISO standard.  (I think
scheme itself is in a similar predicament as an IEEE standard.)  Of course,
status as a standard doesn't affect its usefulness or integrity as a
language, but being a non-standard (or worse, a has-been standard) is just
one more reason for people not to choose DSSSL.

If indeed you are prepared to commit some real resources to further
implementations of DSSSL, it might behoove others (especially non-coders
such as myself) to get involved with their national standards bodies as they
review the standard.

2. The built-in backends for jade made its implementation of the style
language immediately useful.  Some built-in backends for the transformation
language would be essential.  At minimum, it should be able to emit SGML
documents (including declarations and DTDs), and canonical grove
representation.  Other possibilities would be STEP Express instances, and
CGM files.  I believe some preliminary work on grove plans for both of these
notations was done a few years ago, but I don't know if it is possible to
recover or resuscitate these efforts.

3. Anything that would make openjade more directly usable as the back end of
a http server would make it more attractive to a wider market.  I have no
idea what this would entail, but some useful features would be: persistent
grove representation; ability to invoke in-memory "compiled"
transformations; database-to-grove translation component.

If, in the best of all possible worlds, openjade came with a mini-http
server built in, you could have an instant SGML server that would eliminate
the need for *any* translation or preprocessing of your SGML source data for
web publishing.  It would support HTML viewing or on-demand composition and
delivery of PDF files from the same SGML source.  With additional front-end
notation processors, it could be a complete "information server", as
envisioned (and partially implemented) by GroveMinder.

> Javi

I wish you success with your efforts, Javier.  As you can tell, I agree with
you 100% that it is a Good Thing to do, and I would like to help if
possible.  But the skeptical side of me says that the tidal wave of XML and
related W3C standards has all but completely washed away the hopes and
dreams embodied in ISO standards DSSSL:1996 and HyTime TC:1997.

But the tidal wave didn't wash away the real difficulties inherent in
processing complex document-based information.  It remains to be seen
whether, when the wave recedes, the efforts so ably conceived and executed
by the ISO committees will thrive and prosper, or if completely new
solutions will be discovered.

Good luck,

Paul Tyson, Principal Consultant                   Precision Documents
paul@xxxxxxxxxxxxxxxxxxxxxx              http://precisiondocuments.com
     "The art and science of document engineering."

 DSSSList info and archive:  http://www.mulberrytech.com/dsssl/dssslist

Current Thread