Subject: Re: Whitespaces against efficency From: keshlam@xxxxxxxxxx Date: Wed, 12 Apr 2000 08:42:55 -0400 |
I know it's offtopic for this list, so I'll post once and then refer you to the DOM's own list -- see http://www.w3.org/DOM for details thereon. >I second that. I mean, I realize they are different, but I'm not familiar >with the DOM. I have looked in vain for an explanation of the DOM model >that is as straightforward as the XPath model. Any pointers appreciated. I presume you've tried the DOM spec itself. The XPath and DOM models are very similar, since both are tracking the Infoset. The differences are mostly related to "round-tripping" -- writing the document back out in a reasonably close approximation of the same representation it had when it was read in -- plus a few minor stylistic details. Doing a _very_ quick scan through the XPath model and comparing it to the DOM: The DOM handles most contained text as child Nodes -- either Text nodes, or CDATASections nodes (subclass of Text, corresponds to <![CDATA[ ]]> blocks). Some node types (eg Attr) have convenience methods for retrieving all contained text (nodeValue); Elements don't, because the text might be mixed with further markup. What XPath calls an "expanded-name" is represented in DOM Level 2 as seperate namespace URI and localname fields on the node. There's also a field for the prefix, which may or may not be present depending on how the node was created. (If it isn't, then obviously you have to synthesize it if you want to write out the document as XML syntax.) [Let's NOT get into XPath's interpretation of relative URIs in namespaces right now.] What XPath calls the root node is the Document node in a DOM. Entity references in the DOM may be fully expanded (replaced by the entity's contents) or not (in which case an EntityReference node appears, with the entity's contents as children if available), depending on how the DOM was generated. There are use cases for both views. DOM Level 2 adds a getElementByID() method. (Though a DOM API for the DTD or schema has been deferred to Level 3.) DOM attribute nodes do not have a parent, but do have an ownerElement. The distinction is one of symmetry: in the DOM, if you have a parent you're a child, and Attrs are not children of an Element in the usual sense. (They live in a NamedNodeMap associated with the Element.) The DOM does let you distinguish between defaulted and explicit attribute values. The DOM dosn't have "namespace nodes" -- namespace declaration attributes appear as attributes (if they appear at all, which they may not in a programmatically-created DOM), and a node's namespace URI is bound to it at the time the node is created. (If the declaration isn't present, then obviously you have to synthesize it if you want to write out the document as XML syntax.) Re Text nodes: The DOM will normally be delivered in normalized form, with all contiguous text merged into a single node. The normalize() operation will put a subtree back in this form after editing. Exception: CDATASections are _not_ merged with adjacent Text (or other CDATASections) during normalization, to allow them to be written back out in the same form they were read in. As noted above, characters inside attribute values _are_ represented as Text node children of the Attrs, because they might be interleaved with "unflattened" EntityReference nodes. Given the subject line, your question may be about "ignorable" whitespace -- that is, whitespace appearing "in element context", such as indentation within an element known not to accept #PCDATA content. The brief answer is that the DOM doesn't yet distinguish this -- doing so requires knowing details of the DTD or schema, which won't be available until DOM Level 3. Various DOM implementations have implemented their own custom solutions, typically via something like a Text.isIgnorableWhitespace() method... but there's no standard yet. Beyond that, most of the DOM is just APIs and convenience methods for navigating and manipulating this model. ______________________________________ Joe Kesselman / IBM Research XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: Whitespaces against efficency, Kay Michael | Thread | Some more questions about HTML to W, Zhang Tao |
Re: Rant : "Microsoft is compliant , Richard Bell | Date | Retraction: I've modified my parser, Dan Morrison |
Month |