Re: Whitespaces against efficency

I know it's offtopic for this list, so I'll post once and then refer you to
the DOM's own list -- see http://www.w3.org/DOM for details thereon.

>I second that. I mean, I realize they are different, but I'm not familiar
>with the DOM. I have looked in vain for an explanation of the DOM model
>that is as straightforward as the XPath model. Any pointers appreciated.

I presume you've tried the DOM spec itself.

The XPath and DOM models are very similar, since both are tracking the
Infoset. The differences are mostly related to "round-tripping" -- writing
the document back out in a reasonably close approximation of the same
representation it had when it was read in -- plus a few minor stylistic
details.

Doing a _very_ quick scan through the XPath model and comparing it to the
DOM:


The DOM handles most contained text as child Nodes -- either Text nodes, or
CDATASections nodes (subclass of Text, corresponds to <![CDATA[ ]]>
blocks). Some node types (eg Attr) have convenience methods for retrieving
all contained text (nodeValue); Elements don't, because the text might be
mixed with further markup.

What XPath calls an "expanded-name" is represented in DOM Level 2 as
seperate namespace URI and localname fields on the node. There's also a
field for the prefix, which may or may not be present depending on how the
node was created. (If it isn't, then obviously you have to synthesize it if
you want to write out the document as XML syntax.)

[Let's NOT get into XPath's interpretation of relative URIs in namespaces
right now.]

What XPath calls the root node is the Document node in a DOM.

Entity references in the DOM may be fully expanded (replaced by the
entity's contents) or not (in which case an EntityReference node appears,
with the entity's contents as children if available), depending on how the
DOM was generated. There are use cases for both views.

DOM Level 2 adds a getElementByID() method. (Though a DOM API for the DTD
or schema has been deferred to Level 3.)

DOM attribute nodes do not have a parent, but do have an ownerElement. The
distinction is one of symmetry: in the DOM, if you have a parent you're a
child, and Attrs are not children of an Element in the usual sense. (They
live in a NamedNodeMap associated with the Element.) The DOM does let you
distinguish between defaulted and explicit attribute values.

The DOM dosn't have "namespace nodes" -- namespace declaration attributes
appear as attributes (if they appear at all, which they may not in a
programmatically-created DOM), and a node's namespace URI is bound to it at
the time the node is created. (If the declaration isn't present, then
obviously you have to synthesize it if you want to write out the document
as XML syntax.)

Re Text nodes: The DOM will normally be delivered in normalized form, with
all contiguous text merged into a single node. The normalize() operation
will put a subtree back in this form after editing. Exception:
CDATASections are _not_ merged with adjacent Text (or other CDATASections)
during normalization, to allow them to be written back out in the same form
they were read in. As noted above, characters inside attribute values _are_
represented as Text node children of the Attrs, because they might be
interleaved with "unflattened" EntityReference nodes.

Given the subject line, your question may be about "ignorable" whitespace
-- that is, whitespace appearing "in element context", such as indentation
within an element known not to accept #PCDATA content. The brief answer is
that the DOM doesn't yet distinguish this -- doing so requires knowing
details of the DTD or schema, which won't be available until DOM Level 3.
Various DOM implementations have implemented their own custom solutions,
typically via something like a Text.isIgnorableWhitespace() method... but
there's no standard yet.



Beyond that, most of the DOM is just APIs and convenience methods for
navigating and manipulating this model.

______________________________________
Joe Kesselman  / IBM Research




 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
<- Previous	Index	Next ->
RE: Whitespaces against efficency, Kay Michael	Thread	Some more questions about HTML to W, Zhang Tao
Re: Rant : "Microsoft is compliant , Richard Bell	Date	Retraction: I've modified my parser, Dan Morrison
	Month
<-prev [Thread] next->	<-prev [Date] next->
Month Index \| List Home