Of Interoperability (was Re: [xsl] XPath over DOM)

Subject: Of Interoperability (was Re: [xsl] XPath over DOM)
From: Uche Ogbuji <uche.ogbuji@xxxxxxxxxxxxxxx>
Date: Thu, 15 Feb 2001 08:42:19 -0700
> Uche Ogbuji wrote:
> > 
> > > Are there any Java-based XSLT or XPath implementations (at least partial
> > > functionality) that can traverse over existent DOM object, as opposed to
> > > creating their own in-memory structure? I need to be able to provide my own
> > > Document object built by my own parser, not just DOM APIs over internal
> > > processor structure (as Saxon implementation).
> > 
> > Why would any Java XSLT processor have a problem working with another's DOM
> > nodes?
> > 
> > Could it be that even for mighty Java, interoperability is all illusory?
> > 
> 
> Hello
> 
> I guess (I may be wrong), that in this particular case we deal with
> another interoperability problem. This problem is caused by the nature
> of W3C recommendations, but is not related to Java.

I've already admitted that my remarks were poor form.  It was a response to 
the "interoperability wars".  Some of us argue that XSLT 1.1 hampers 
interoperability (ugly neologism, that).  The Java folks claim that it 
improves interoperability for Java users, which I think completely misses the 
point.

So imagine my surprise at the idea that Java XSLT implementations might have 
trouble operating with other implementations' DOM nodes, given that DOM has a 
Java language binding.

This and the JDOM story just illustrate the pitfalls that open up when 
standards bodies try to impose limited interoperability by fiat without 
considering broader interoperability and emerging practice.

> The issue is, at my opinion, that DOM (levels 1 and 2) and XSLT/XPath
> (and XML Infoset ??) are using different document models.

I agree this is is an issue, but it didn't seem like the core issue here.  
Yevgeniy said that processors used their own DOM implementations.  Maybe he 
was unclear, but I had no reason to think he didn't mean "DOM"

In fact, 4XSLT transparently uses any Python DOM implementation (and we didn't 
need help from the spec for this, BTW) and provides wrappers that map it to 
the XPath model.  If text or SAX events are the source model rather than an 
existing DOM tree, then a special lightweight DOM-like XPath model is 
constructed with a set of optimizations.

Take document indexing.  If text or SAX are the source, cDomlette or pDomlette 
are created, and document order is assigned in the parsing pass.  If a DOM 
implementation is passed in, the DOM does not have a doc order index, so by 
default 4XSLT executes a separate indexing pass, and then proceeds to use the 
DOM nodes as they are.

I had no reason to think the Java implementations don't allow the same sort of 
facility.

> To be specific, namespaces are handled differently. In XSLT/XPath, every
> element node must have attached the distinct list of namespace nodes for
> all namespaces on the scope of that node. Any particular namespace
> declaration must be replicated as a unique namespace node on each
> descendant element of the element which originally declared that
> namespace. In DOM, there is no thing like a namespace node at all;
> namespace declarations appear as normal attribute nodes only once, on
> the node representing an element which originally declared that
> namespace.

No, but namespace nodes can be indexed and superimposed over the DOM.

> There also some minor differences (in XSLT/XPath the parent of an
> attribute is its owner element; in DOM it is always null).

4XSLT handles this with attribute wrappers.

> I do not claim that these two document models are incompatible. It is
> possible to implement mapping between these two models; one can create
> an API on the top of DOM, which implements the document model required
> by XSLT/XPath. 

Yep.  You anticipated my response.

> I guess (again, I may be wrong) that many XSLT vendors find
> implementation of such mapping impractical. Performance can be one
> reason. In particular, the specially designed representation of the
> source document can become a key factor in achieving a good performance,
> especially for processing large documents;

It takes some work and experiemntation, but it can be done efficiently.

> so many XSLT processors may
> strongly rely on such (internal, proprietary) representation. It may be
> technically difficult (though not impopssible) to write a single piece
> of code (not necessarily in Java) which can work with both proprietary
> data representation and with data represented using DOM.

Understood, but I assumed that their internal representation was DOM-based.

> Anyway, the fact that various branches of W3C recommendations are based
> on conceptually different document models is certainly alarming. It is
> very bad for interoperability; even worse, it is probably too late now
> to change anything. More problems may arise, once W3C will start to
> issue recommendationss which rely on both DOM and XSLT/XPath
> specifications.

Agreed.  The infoset and XML 1.0 itself show some differences from XPath and 
DOM, as several XML-DEV threads have discussed.  I think this is also another 
symptom of the W3C's so often getting ahead of itself.

The clamoring of deep-pocketed vendors should not outweigh sensible design 
practice, but the reality is that if strong enough interests want something 
from the W3C, they seem to get it, regardless of its suitability given timing, 
precedent or practice.  The DOM's preceding the infoset is one example, and I 
think it's fair to say that XSLT illustrates more of the same.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@xxxxxxxxxxxxxxx               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread