Re: Nostradamus (was Re: FO. lists as tables)

Subject: Re: Nostradamus (was Re: FO. lists as tables)
From: pandeng@xxxxxxxxxxxx (Steve Schafer)
Date: Fri, 15 Oct 1999 03:52:55 GMT
On Thu, 14 Oct 1999 17:38:27 -0500, you wrote:

>To get complete fidelity you'd almost have to specify the algorithms.

Almost, but not quite. What you have to specify is a set of detailed
constraints. 

>Take TeX, for example.  It produces a certain set of line breaks for a
>given text.  If I use a different algorithm I get different line
>breaks in all liklihood.

Certainly, there are many different techniques you can use to break
lines, and they give different results. (The classic paper by Knuth
and Plass illustrates this clearly.) But _any_ line-breaking algorithm
can be specified in terms of what it _achieves_ without having to
specify _how_ it achieves it.

Take TeX's line-breaking algorithm, for example. You can specify what
it does in terms of its results: It breaks lines on a per-paragraph
basis in such a way as to minimize the total number of "demerits" in
that paragraph, where "demerits" has a precise mathematical definition
(which I won't go into here).

With a specification of the _rules_ governing the TeX algorithm in
hand, you are free to implement it any way you choose. And your
implementation will break lines in exactly the same places that my
implementation does (assuming no bugs, of course).

>I think it is possible to design a language that would allow the page
>designer to constrain composition at a very finely grained level
>(think of expressing TeX's linebreaking in terms of "policy"), but it
>would be (is, actually; I'm working on it) pretty difficult to do.

I don't think it's all that difficult. It's certainly complex and time
consuming, in the sense that there are lots of little pieces that have
to fit together, but it's not particularly difficult. (I'm working on
it, too. :)

>And even given such a language, would it be worth the trouble for
>vendors to conform to such an exacting degree when something well
>short of 100% fidelity will satisfy just about everybody?

1) Nobody is forced to use a standard.

2) Standards can have different "levels of compliance." An
implementation can comply with a "basic" level of the specification
without complying with all aspects of it.

>So the real design goal is to find the sweet spot, where composition
>is sufficiently tightly constrained to ensure a relatively high
>degree of similarity of outcome across implementations (i.e. this
>ain't no html) but not so tightly as to frighten off implementors.

I'm not saying that XSL would be useless without 100% identical output
from every implementation--I understand the vagaries of TrueType and
different output device resolutions and all that. But the XSL
Requirements Summary says that predictability isn't even a _goal_.
That's what I find disturbing. Without a precise definition of what
"similar results" means, I assure you that we _will_ have HTML all
over again.

>Otherwise we end up with the situation we have today, where the
>meaning of language semantics is almost always - intentionally even!
>- based on implementations, and conformance is dictated by whoever
>gets to market first with the biggest PR budget.

We will have that anyway. Today, nobody writes HTML that conforms to
any particular standard, they write HTML that works in Internet
Explorer and Netscape Navigator. If the XSL processor in some
hypothetical Internet Explorer 7.0 contains a bug that requires that
the fo's be ordered in a certain way, then people will write XSLT that
transforms their XML documents into XSL fo's that work around that
bug, no matter that the XSL spec says that the fo's don't have to be
ordered in that way.

In fact, a precise, detailed specification (perhaps accompanied by an
open-source reference implementation in Java, to disambiguate any
uncertainties in the prose of the spec) would go a long ways towards
making it _easier_ for the little guys to compete with the giants, by
reducing the amount of "de facto standardization" that can occur.

>For most other users its not that important; whether Madame Bovary
>gets snuffed on page 365 or 367 doesn't matter much in the grand
>scheme of things.

To you and I, probably not. But to the author, it might. I remember an
incident that occurred several years ago: I was preparing a manuscript
for submission to a scientific journal, and had carefully formatted
the document so that page breaks occurred in "good" places, etc. I had
been proofing the document on a 300 dpi inkjet printer attached to my
PC. When I had everything the way I wanted, I sent the document to the
600 dpi laser printer for the final output, only to find that because
of differences in rounding during line-height calculations, the
laser-printed output had one fewer line per page than the
inkjet-printed output.

Computers are supposed to make people's lives easier. There's no
reason to have to put up with that kind of nonsense.

-Steve Schafer


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread