RE: [xsl] XPath over DOM

Subject: RE: [xsl] XPath over DOM
From: "Michael Kay" <mhkay@xxxxxxxxxxxx>
Date: Fri, 16 Feb 2001 09:17:26 -0000
> > Saxon doesn't do it (at present) because of the cost and
> complexity of (a)
> > sorting nodes into document order when they don't contain a
> serial number,
> I see the cost, but this should not exced cost of conversion.

Agreed. That's why I recognize the requirement. But in most cases, the input
doesn't arrive as a DOM, it arrives as source XML or a SAX stream, so no
conversion is necessary.

>  In abstract
> terms, it's a single tree traversal, with one assignment per node.

That's one way of doing it, but it means you have to have somewhere to put
the sequence number. You can't put it in the DOM objects themselves, so you
have to create a static wrapper, which involves creating more objects:
hopefully not one per node, otherwise you might as well rebuild the tree.
The approach Xalan an xt use (I believe) is to do the document order
comparison dynamically, by finding the lowest common ancestor of two nodes.
> > (b) skipping over and counting nodes correctly in the
> presence of things
> > such as entity reference nodes, CDATA nodes, and
> unnormalized text nodes,
> > and
> There is a normalize() if the user doesn't mind mutation.

Mutation of the supplied tree, I think, is out of the question. (This also
makes whitespace stripping much more difficult - another thing I forgot to

Incidentally, MSXML3 gets this wrong: using CDATA gives you multiple
adjacent text nodes. I think that's evidence that it's not easy: and they
have the advantage that they only work with their own DOM implementation.

> The rest, at least
> as I've attacked it, is a matter of wrapping, again in the
> same pass as doc-order indexing.

I'm thinking of doing it (eventually) in Saxon by dynamic wrapping using
flyweight objects, in the same way as the Saxon "tinytree" currently works.
> > (c) dealing with the multitude of ways that the DOM allows namespace
> > nodes to be (or not be) represented.
> ???  Do you mean Level 1 vs. Level 2?

That's part of the issue. Element and attribute names in the DOM can contain
a namespace URI, the namespace URI may or may not be present in an xmlns:xxx
Attr node. The set of namespace nodes, as far as I can see, is the union of
namespaces that are used in element and attribute nodes plus namespaces that
are declared in xmlns:xxx pseudo-attributes, in the current element or in
any ancestor.

Mike Kay

 XSL-List info and archive:

Current Thread