Re: [xsl] XPath over DOM

Subject: Re: [xsl] XPath over DOM
From: Uche Ogbuji <uche.ogbuji@xxxxxxxxxxxxxxx>
Date: Fri, 16 Feb 2001 08:27:31 -0700
> >  In abstract
> > terms, it's a single tree traversal, with one assignment per node.
> 
> That's one way of doing it, but it means you have to have somewhere to put
> the sequence number. You can't put it in the DOM objects themselves, so you
> have to create a static wrapper, which involves creating more objects:
> hopefully not one per node, otherwise you might as well rebuild the tree.
> The approach Xalan an xt use (I believe) is to do the document order
> comparison dynamically, by finding the lowest common ancestor of two nodes.

Ah.  I see.  I guess this is the advantage of Python's dictionaries 
(associative arrays).  It makes such external decoration quite trivial, and 
quite efficient.  Perl folks gain the same advantage, and C folks can do so 
with a simple hash table.

> > > (b) skipping over and counting nodes correctly in the
> > presence of things
> > > such as entity reference nodes, CDATA nodes, and
> > unnormalized text nodes,
> > > and
> >
> > There is a normalize() if the user doesn't mind mutation.
> 
> Mutation of the supplied tree, I think, is out of the question. (This also
> makes whitespace stripping much more difficult - another thing I forgot to
> mention.)

Then you'd have to wrap with internal indices.  More complex.

> Incidentally, MSXML3 gets this wrong: using CDATA gives you multiple
> adjacent text nodes. I think that's evidence that it's not easy: and they
> have the advantage that they only work with their own DOM implementation.

No one says it's easy.  It it takes time and experimentation.

> > The rest, at least
> > as I've attacked it, is a matter of wrapping, again in the
> > same pass as doc-order indexing.
> 
> I'm thinking of doing it (eventually) in Saxon by dynamic wrapping using
> flyweight objects, in the same way as the Saxon "tinytree" currently works.
> >
> > > (c) dealing with the multitude of ways that the DOM allows namespace
> > > nodes to be (or not be) represented.
> >
> > ???  Do you mean Level 1 vs. Level 2?
> 
> That's part of the issue. Element and attribute names in the DOM can contain
> a namespace URI, the namespace URI may or may not be present in an xmlns:xxx
> Attr node. The set of namespace nodes, as far as I can see, is the union of
> namespaces that are used in element and attribute nodes plus namespaces that
> are declared in xmlns:xxx pseudo-attributes, in the current element or in
> any ancestor.

Oh.  We sort this out on our scan pass.  The algorithm is pretty simple, 
actually.

But most of what we do takes advantage of dictionaries, which helps a lot.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@xxxxxxxxxxxxxxx               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread