FOs considered helpful

Subject: FOs considered helpful
From: Paul Prescod <paul@xxxxxxxxxxx>
Date: Mon, 03 May 1999 00:28:53 -0500
People create and use dumbed-down, inaccessible information every day. I
do not think that the W3C can change that. It is merely human nature to do
the least work possible to achieve a goal. If it is not likely that a
sight-impaired person will view a particular document, few people would go
out of their way to make that document accessible. This explains, in part,
the popularity of mostly-inaccessible file formats like MSWord, RTF, TeX,
PDF and presentational HTML.

I cannot in good conscience claim that using these formats is wrong. I've
used them myself. I will continue to. It is am unfortunate fact of life
that one must measure costs and benefits and it is much, much cheaper to
use a ready-made format instead of true semantic markup. The existence and
popularity of the concept of "table markup" is evidence enough that even
most semantic-markup purists have limits.

Hakon says:

> > In my organization, I have thousands of legacy word processing 
> > documents where styles have been used inconsistently. Don't you 
> > think it's better to use XFO and admit that there is no semantics 
> > rather than using HTML and claim there is? 
>       Use presentational HTML or PDF for your documents. We can't 
> risk losing the semantic Web due to legacy documents. 

But it isn't a legacy/future issue. It is a cost/benefit issue. Semantic
markup is *extremely expensive*. The maintenance of the style
specifications and document types can easily swamp the benefits for a
one-off document. It is, in my opinion, disingenuous to think that we can
move the world, or even the web, to a purely structure-oriented paradigm.
I learned early in my career not to advocate semantic XML to a customer
unless they had a problem that was costing them such a huge amount of
money that semantic XML was the only solution.

So one response to this reality is to try to make accessible markup as
easy as possible. This is why I have been arguing for ICADD-ish formatting

Another, parallel response, is to recognize that non-semantic markup is
not evil. It is just cheap. It is next to impossible to try to replace
something cheap with something expensive so semantic XML will never
replace RTF. We should accept that.

The next best thing is to make a non-semantic XML document type that is

 a) as accessible as possible (but no more :),

 b) as visually sophisticated as proprietary document types,

 c) non-proprietary.

In other words, I am making the probably controversial statements that the
formatting object document *should* be viewed as a replacement for RTF and
PDF and this should only be viewed as "abuse" in situations where RTF or
PDF would be abuse -- i.e. where the semantic markup exists but is hidden
or where the benefits of real semantic markup outweigh the costs. 

That will be tough call in many situations but pretending that the
non-semantic markup world does not exist will not help. All it will do is
cede this huge market to the whims of vendors. They give us RTF generators
that do not conform to the RTF "spec", non-editable language like PDF,
languages like TeX with, AFAIK, no formal definition at all and so forth. 

One could argue that HTML *is* the official W3C non-semantic markup
language. The only problem is that HTML is not as visually sophisticated
as proprietary document types so it cannot replace them without
"extensions" such as the ones in Word 2000.

I think that the W3C should take the lead in the area of non-semantic
markup as they have in semantic markup. "If you are going to create
documents with embedded formatting anyway, do it our way: we can help it
to be accessible, portable, and universally editable."


Here is my working definition of "semantic markup": It is markup which
captures sufficient amounts of the structure of the document to allow a
reasonably broad set of automated processes to do their jobs. The
definition is necessarily vague. HTML is "semantic enough" for vanity home
pages but not for the Encyclopedia Brittanica. One would like to unleash
more sophisticated automated processes on the encyclopedia than on the
home page. The more semantic a markup language is, the more specific it is
to a few documents. The "ultimate" semantic markup language is optimized
for a particular document. If you take semantic markup to this extent then
you lose the economies of scale that code reuse provides.

 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself

Diplomatic term: "We had a frank exchange of views."
Translation: Negotiations stopped just short of shouting and
(Brills Content, Apr. 1999)

 XSL-List info and archive:

Current Thread