Re: "Roots" of confusion introduced at W3C (long)

Subject: Re: "Roots" of confusion introduced at W3C (long)
From: AndrewWatt2000@xxxxxxx
Date: Fri, 22 Sep 2000 02:45:18 EDT
Bob,

Thanks for your reply. I appreciate its balanced, measured tone. That 
significantly helps a discussion which undoubtedly has the potential to 
become heated.

I am surprised that there has been no response to this thread from anyone at 
W3C. If the problems as to what is a "root" are as complex and confusing in 
W3C documents as I believe they are then I hope the silence is an indication 
that the issue is being seriously considered.

I acknowledge I could have phrased my initial post a little better. I hope I 
do better this time. :)

Let me return to the points you made about the XML 1.0 Recommendation.

You commented, referring to the XML 1.0 Recommendation: "I'm not claiming 
that it's all very well-organized.". There we agree.

However, I think the problems with the XML 1.0 Recommendation are much more 
serious than you presently acknowledge.

Given the nature of our present discussion, I suggest the claim in the first 
sentence of the Recommendation that XML "is completely described in this 
document" can be seen to be a dubious claim, at best.

However, let's move on to the more specific issues you raise.
In a message dated 20/09/00 22:37:54 GMT Daylight Time, 
Robert.DuCharme@xxxxxxxxxx writes:

> >Thus, for this foundational concept of the "root" of an XML document we
>  find 
>  >multiple terms being, apparently, used for the same thing and certain 
terms
>  
>  >being used for more than one thing.
>  
>  An XML document can be represented as various kinds of trees, which serve
>  different purposes, and any tree has a root that serves an important role 
in
>  the use of that tree. There is no single concept of an XML document "root"
>  that serves all purposes; the meaning of "root" depends on the type of tree
>  representation of the XML document being discussed. 

If, as you suggest, the meaning of "root" depends on the tree representation 
of that document why is that not explained in the XML 1.0 Recommendation? If, 
as I think likely, this foundational concept is not conveyed in the 
Recommendation is that not a serious deficiency?

Is that failure in clear communication in the XML 1.0 Recommendation not 
lying at the root (sorry for the pun it was just my natural flow :) ) of the 
consequential problems with the other W3C documents? If XML 1.0 did not 
clearly define what a "root" was it is unsurprising that the confusion has 
been transmitted into other documents.

Further, if you are correct that the concept of "root" depends on the tree 
representation, why do the editors of the Recommendation use the term "root" 
in two distinct usages in Section 2 and Section 2.1 of the Recommendation 
without adequate explanation?

>  When switching back and forth between XML-related specs, the difference in
>  the types of trees being discussed can be confusing. I don't understand 
them
>  all, but I do know the XML 1.0 spec pretty well, and it's much more
>  internally consistent than you make it out to be.

Bob, if someone with your experience can honestly acknowledge not to 
understand them all what hope is there for newer entrants to the field of XML?

I think you over-estimate the consistency of the XML 1.0 Recommendation.
  
>  The XML 1.0 Rec says that documents have a physical structure and a logical
>  structure. The document entity is the root of the physical structure. It's
>  the entity (on most operating systems, a file) that the parser reads in
>  first, looking for references to additional external entities to read in.
>  The root element is the root of the logical structure; it's the element 
that
>  contains all the other elements--the document element. The logical 
structure
>  doesn't care about the physical structure, and the physical structure only
>  cares about logical structure if each component of the physical structure
>  (each entity) wants to qualify as a well-formed entity. 

I have several points here.

First, the supposed clear separation between physical and logical structure 
is more apparent than real. It is claimed in the Recommendation that the 
physical structures must "nest". But do they? As "physical structures"?

They do (or should) nest logically. But physically?

The Recommendation goes on to describe the document entity as the root of the 
"entity tree". Is there a physical "entity tree"? I think not. It is a 
logical relationship. The "document entity" which Section 2 claims is a 
"physical structure" is also, so it seems, the root of a logical "entity 
tree". Or would you wish to claim that a "physical" entity tree exists?

The separation of "physical" and "logical" structure in the XML 1.0 
Recommendation is much less clear than some might suppose.

In passing, I would point out that the nature of the "entity tree" is left 
undefined. Another illustration of the incompleteness of the document.

But the Recommendation later claims the document entity "has no name". In my 
file system the document entity, as a "physical structure" does have a name - 
"sample.xml", for example. So in what sense does Section 4.8 refer to the 
document entity having no name? Section 4.8 is referring to the document 
entity in another usage - when it is being **logically** combined with any 
other entities - in all likelihood after it has been transformed into some 
kind of tree structure. At that time the "document entity" is no longer in 
the same "physical structure". It is no longer the document entity as 
"physical structure" - it is another representation. Is it not? The term 
"document entity" in that context is being used to describe a **logical** 
representation of the physical structure.

So "document entity" is being used, without clear explanation, in the 
Recommendation in at least two senses. The "physical structure" and some 
(undefined) transformation/interpretation of that physical structure.

Can you see how the Recommendation blurs and confuses the supposed 
distinction between "logical structure" and "physical structure"?
  
>  >XML 1.0 - "document entity" (Section 4.8). The terms "root node" and 
>  >"document root" do not occur in the XML 1.0 Recommendation.
>  
>  The DOM came after the XML spec, so the term "node" doesn't appear in the
>  Rec except for a reference in Appendix E to a classic computer science
>  work's description of finite state algorithms.

I appreciate that the DOM 1 Recommendation came later. It refers back to XML 
1.0 in the definition of "root node". As we have established the XML 1.0 
Recommendation has no definition or description of a "root node". So the link 
from DOM 1.0 is to a vacuum, at least as far as that term is concerned.

 The XML Rec never set out to
>  define things in terms of nodes.

So, can we agree that the XML 1.0 Recommendation does not, as is claimed, 
"completely describe" XML?

 Representations of XML documents that serve
>  certain purposes, like XPath and the DOM, later used the concept of a tree
>  of nodes to describe their representations.

I have no inherent difficulty with that concept being used. My point was that 
the Recommendations are not adequately linked. Surely we ought to have had 
"joined up thinking"? :)
  
>  >In addition XML 1.0 confuses the issue by using the term "document 
entity" 
>  >to, apparently, refer to both the root of the tree (Section 4.8) and also
>  the 
>  >whole serialised document.
>  
>  The XML 1.0 Rec never mentions serialization either. Section 4.8 clearly
>  states that the document entity is the root of the *entity* tree (i.e. the
>  physical structure). Nowhere does the Rec imply that the document entity is
>  the whole document; a document entity can easily have references to other
>  entities that act as components of the document without being part of the
>  document entity.

Please see my comments earlier about section 4.8. It, in my view, subtly but 
profoundly confuses and undermines any clear distinction between logical and 
physical structure.

Section 4 of the Recommendation states that the document entity "may contain 
the whole document". I appreciate that it need not do so. However, it does 
support my comment that the document entity, as a term, is used in a way 
which seems, in some circumstances, to refer to the whole document.
  
>  >XML 1.0 further confuses the issue by using the term "root" (with no 
>  >qualifier) to refer to the "document element", a child of the "document 
>  >entity".
>  
>  The XML 1.0 spec *never* refers to the document element as a child of the
>  document entity. This confuses the physical and logical structure of an XML
>  document. 

Bob, as I pointed out earlier, the XML 1.0 Recommendation itself confuses the 
notions of the physical and logical structure.

(In XSLT, a document element node is a child of the source tree
>  node, but this is unrelated. Entities in general are meaningless to XSLT
>  because the XML parser that passes an input document to an XSLT processor
>  resolves all entities as it builds the source tree that XSLT actually works
>  on.)
>  
>  Outside of the XML Rec, the XPath Rec says that "XPath models an XML
>  document as a tree of nodes." This is the model that XSLT uses, and while
>  the DOM also talks in terms of trees of nodes, a DOM tree is different. 

I appreciate that the DOM tree is different. That difference is one of the 
sources of confusion that I mentioned in my earlier post.

I admit I could have missed this but does any W3C document adequately explain 
those differences or the practical consequences of them? Should some W3C 
document not actually do so?
  
>  I'm not claiming that it's all very well-organized.

There we agree. :) But, as you will realise, my concerns go much further.

> Otherwise, there
>  wouldn't have been a need for the Infoset document, and Paul Prescod's talk
>  of groves wouldn't sound so useful. There is plenty of potential for
>  confusion, but if you remember that different tree representations of a
>  document (each with their own root) serve different purposes, it's a big
>  help in keeping better track of what's what.

I will ask the question separately in a parallel post but just what 
terminology should people use to communicate clearly about what part of an 
XML document is being referred to?

I hope you can begin to see why I have serious concerns within the XML 1.0 
Recommendation and how it, on its own, contributes to the confusion in this 
matter. Of course the differences in terminology between documents adds to 
the problem.

Andrew Watt
  
>  Bob DuCharme          www.snee.com/bob           <bob@  
>  s


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread