Re: SUMMARY: XML Validation Issues (was: several threads)

Subject: Re: SUMMARY: XML Validation Issues (was: several threads)
From: Chris Lilley <chris@xxxxxx>
Date: Wed, 07 Apr 1999 01:38:15 +0200

Jelks Cabaniss wrote:
> 
> Chris Lilley wrote:
> 
> > I don't sense consensus yet on whether client-side validation is always
> > desirable; it clearly is in some cases and clearly adds little in other
> > cases.
> 
> Wouldn't it depend on what the client is? 

Yes. Which is why I wrote that I don't sense concensus on this - there
are arguments both for and against; for rewuireing validation, for never
requiring it, etc.

>  The creation of XML 1.0 (as opposed
> to just well-formed SGML) made validation optional; didn't the designers have
> browsers in mind when they made this decision; in fact wasn't it the MAIN reason
> they made such a decision?

Probably, you would need to ask them.

> > The assertion has been made that client-side validation is a performance
> > load, compared to just parsing the dtd looking for fixed attributes etc;
> > but no performance figures were made available. If someone has a parser
> > they could instrument and provide some actual measurements on real-world
> > data, that would help.
> 
> Assuming that validation were equally as fast, I still don't think that makes a
> case for *forcing web browsers* to do what XML 1.0 says is optional. 

True, but the assertion was made that validation should never be
required because of the performance load (compared to parsing the dtd
including external subsets, but not validating). Which implies, if there
were negligible performance load, them whould validation be a desirable
thing?

> In another message:
> 
> > My feeling is that there are three classes of implementation, that
> > should all have names:
> >
> > minimal well-formed - never tries to follow external entities
> > full well-formed - always tries to follow external entities
> > full validating - always tries to follow external entities and validates
> 
> Agreed.

Currently, the first two are not adequately distinguished, it seems.
And, it seems that there are a lot of implementations that fall into the
second class - perhaps that is even the majority class.

> > and it should be possible to always derive what class of implementation
> > a particular instance requires. 

You don't comment on that sentence, so does it mean you agree?

> >  My current take on this is that
> >
> > "standalone="yes" is how you declare that a minimal well-formed parser
> > is sufficient; that
> 
> Sounds good.

But, it seems, that standalone-="no" does not meanthat a minimal
well-formed parser has to reject the document witha well formedness
error. But some people seem to think that would be desirable behaviour.
Or perhaps another value for "standalone" would be needed.

> > <!ELEMENT occurring anywhere in the internal or external subset is how
> > you indicate that a validating parser is required

There are two related but separate assertions that can be made

1) this document is valid
2) this document needs a validating parser

I didn't adequately distinguish these before, which was remis of me.

> I don't like this (though evidently a number of people are assuming or
> advocating it). 

I didn't like it much either, but it seemed to be, on inspection, what
the XML spec said.
James Clark seemed to agree, which was a good sign. But I recently heard
Tim Bray and CMSQ say that no, it doesn't mean that at all and in fact
the presence of element declarations should not be construed to mean
that the document is valid or that there is a self-consistent dtd in
there which could be validated against.


> If validation is optional, it's optional -- even if there's a
> stray <!ELEMENT ...> in the DTD.

I am tending to agree that this is what the spec says. So, there is in
fact no way to indicate the assertion "this document is valid".

>  Maybe the author is building his DTD and
> doesn't want to validate it until he's good and ready.  Maybe it's an older DTD,
> he doesn't care about validity any more, and all he wants are default attributes
> for styling purposes.  Must he remove all <!ELEMENTs just to make it viewable in
> a web browser?

These are good arguments. I observe that many parsers which can validate
stop performing well -formedness chacks and start trying to do
validation checks instead once they see element declarations, but it
seems that this is not warranted (not to UI designers, provide two
separate icons for "validate" and "check wf" )

> If there is to be a way to *force* validity by specifying it in the document
> instance, the only way I can see is by amending the spec with something like (as
> I believe you yourself suggested in passing)
> 
>         valid="yes"
> 
> in the declaration.

Right. With a default of "no", of course. So, this would make the
assertion that the document was valid and that assertions could be
tested and perhaps refuted, by a validating parser. In the case of
"valid="no" or perhaps, valid="wf", a validating parser would do what -
declare the document invalid? Agree, yes, its invalid (so why check it)?
Automatically use a non-validating mode, even if it was normally
validating?

> > and that all othger cases are saying that the full-well-formed parser is
> > required.
> 
> That sounds good.
> 
> But IMO "all other cases" should currently include documents having DTDs with
> <!ELEMENTs in them.

It seems so, yes.

> If documents should in fact be able to demand "Hey, if you're a validating
> parser, validate me NOW!" (and there do seem to be some compelling occasions for
> it), a <!ELEMENT in the DTD doesn't impress me as the proper "switch" for it.

No, it didn't impress me much as a good switch either, but I thought
that it *was * the switch. I was wrong, there is currently no such
switch.

Next question, should there be (in other words, is this something that
should be in the document instance).

--
Chris



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread