RE: [xsl] Collapsing run-on tag chains not working in saxon or xalan

Subject: RE: [xsl] Collapsing run-on tag chains not working in saxon or xalan
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Tue, 02 Nov 2004 12:32:58 -0500
At 04:31 AM 11/2/2004, Mike wrote:
Actually, XSLT 1.0 is a little ambiguous on this. It does recognize that the
source tree can be constructed by various routes, it is not necessarily the
direct result of parsing a source XML document. It mentions source trees
derived from a DOM as a specific example. XSLT 2.0 says the same thing much
more explicitly: you can construct a tree any way you like. Or any way your
vendor likes. Most vendors are forced into line by market forces, but some
seem to be able to sell their products regardless.

This is all fine -- I am not inclined to argue that all "XML" handled by XSLT must start life as "XML" in the sense of the W3C Rec ... if it can sit in the file system as HTML or SGML, or in an RDBMS or whatever, and is presented to an XSLT processor through some means that builds a tree out of it, that seems to me on balance to be a good thing.


But it's not MSXML the XSLT engine that fails conformance here: it's MSXML the parser/processor, that does not report all whitespace to the application. Apparently a decision was made at some point that the parser could know better than the author of the application (me) which text nodes were important and which ones were not. This seems to have been done in the belief that without understanding full application semantics, a parser's best option is to rely on a faulty principle to establish which whitespace I want to see ... a principle by which, for example, whitespace appearing with no other text between two elements in mixed content gets thrown away.

Almost any other decision -- pass it all through, examine sibling text nodes before throwing away any whitespace, or consult a DTD or schema to ensure #PCDATA wasn't allowed there before doing so -- would have been better for me, even if a case could be made against any of them. But because it can be debated what the "XML application" is in this case, an argument can be made that MSXML is not, in fact, non-conformant.

It's not the spec itself that's unclear; rather, it's some murkiness at the interface between specs (here, between XML parsing and XSLT: which is the application?). Like Dimitre, I have some concern that similar variations will be the rule in the more complex technologies to come. It's not that I believe all "XML" must actually start as honest XML: processing "XML" (by which I mean the tree-thing we build and then transform, not XML the data format) is too useful not to expect reasonable people will want to do that. Nor do I expect XML processors all to do exactly the same thing. But if the whole point is that variation is a good thing because it allows me choices, then I'll make choices. MSXML's whitespace-handling bug makes it not the premier choice for the kind of work I do (where mixed content abounds), irrespective of questions of conformance. That the unambiguously conformant behavior (not throwing away the whitespace unasked), in this case, is also the right thing to do, just makes it easier for me to choose.

Cheers,
Wendell


====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================

Current Thread