XML is broken (was Re: Why Doesn't IE5 use the DTD to Validate?)

Subject: XML is broken (was Re: Why Doesn't IE5 use the DTD to Validate?)
From: "Simon St.Laurent" <simonstl@xxxxxxxxxxxx>
Date: Fri, 02 Apr 1999 13:22:55 -0500
At 08:20 AM 4/2/99 +0700, James Clark wrote on XSL-list:
>So what is this switch?  The DOCTYPE declaration? The DOCTYPE
>declaration unless it's just an internal subset containing entity
>declarations?  What if I have default attributes declared as well? What
>if I have so many entities that I use an external subset instead?  Where
>does the XML spec mention such a switch?
>
>I know Microsoft-bashing is good, clean fun, but actually they've done
>the right thing here.

Well, if IE 5 isn't broken, maybe it's time to consider (and discuss)
whether the XML spec isn't broken, and badly.

Validation is something that happens or it doesn't, depending on the whim
of the application.  Reading external resources is something that happens
or it doesn't, again depending on the whim of the application.  (That whim
is slightly constrained by requiring validating parsers to read external
resources.)  Namespace support is something that happens or it doesn't at
the whim of the application, and interactions with validation depend on
another set of whims.

On top of that, documents are free to identify themselves with any DTD they
like and then create their own world in the internal subset.

Is this really worth bothering with?  After writing four books discussing
the subject, I have to wonder more and more if validation and all the tools
surrounding it aren't simply too broken to be useful.

Validation as concept is great - applications can hand off certain types of
processing to components, and everyone uses the same set of tools
(schemas/DTDs) to describe what's supposed to be in those documents.

Unfortunately, validation as implemented in XML is a painful joke:
underpowered (no data typing), overpowered (attribute defaulting is a great
idea, but doesn't always work in a nonvalidating environment), complicated
(internal/external subset issues, not to mention IGNORE/INCLUDE), not
reliable (since applications may or may not bother, and documents can
change the rules anytime anyway), not constrained by 'industry practice'
(since there isn't any consensus), and subject to a lot of intricate rules
that take a long time to master.

A better validation approach would:
* Not interfere with well-formed documents (attribute defaulting done
different)
* Provide a simple mechanism for documents to identify their type, not all
the details about their their structure.
* Be reliable.  Applications could control how documents are validated,
instead of relying on the document to provide them with a roadmap.
* Describe more than just text and elements.
* Allow supporting tools (like XSL and XLink, which benefit greatly from a
validating environment) to demand validation of documents against schemas
before attempting processing.

The current solution is an enormous mess, one that threatens to make
validation a useless discard.  I've complained about this to some extent
previously (in the Layered Model document, and on XML-dev), but it's
becoming a sorer point every time I encounter it, which happens pretty
regularly.

Maybe the schemas group can fix this, or maybe we should just chuck the
declaration end of XML entirely.

Simon St.Laurent
XML: A Primer
Sharing Bandwidth / Cookies
http://www.simonstl.com


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread