At 04:31 AM 11/2/2004, Mike wrote:
Actually, XSLT 1.0 is a little ambiguous on this. It does recognize that the
source tree can be constructed by various routes, it is not necessarily the
direct result of parsing a source XML document. It mentions source trees
derived from a DOM as a specific example. XSLT 2.0 says the same thing much
more explicitly: you can construct a tree any way you like. Or any way your
vendor likes. Most vendors are forced into line by market forces, but some
seem to be able to sell their products regardless.
This is all fine -- I am not inclined to argue that all "XML" handled by
XSLT must start life as "XML" in the sense of the W3C Rec ... if it can sit
in the file system as HTML or SGML, or in an RDBMS or whatever, and is
presented to an XSLT processor through some means that builds a tree out of
it, that seems to me on balance to be a good thing.
But it's not MSXML the XSLT engine that fails conformance here: it's MSXML
the parser/processor, that does not report all whitespace to the
application. Apparently a decision was made at some point that the parser
could know better than the author of the application (me) which text nodes
were important and which ones were not. This seems to have been done in the
belief that without understanding full application semantics, a parser's
best option is to rely on a faulty principle to establish which whitespace
I want to see ... a principle by which, for example, whitespace appearing
with no other text between two elements in mixed content gets thrown away.
Almost any other decision -- pass it all through, examine sibling text
nodes before throwing away any whitespace, or consult a DTD or schema to
ensure #PCDATA wasn't allowed there before doing so -- would have been
better for me, even if a case could be made against any of them. But
because it can be debated what the "XML application" is in this case, an
argument can be made that MSXML is not, in fact, non-conformant.
It's not the spec itself that's unclear; rather, it's some murkiness at the
interface between specs (here, between XML parsing and XSLT: which is the
application?). Like Dimitre, I have some concern that similar variations
will be the rule in the more complex technologies to come. It's not that I
believe all "XML" must actually start as honest XML: processing "XML" (by
which I mean the tree-thing we build and then transform, not XML the data
format) is too useful not to expect reasonable people will want to do that.
Nor do I expect XML processors all to do exactly the same thing. But if the
whole point is that variation is a good thing because it allows me choices,
then I'll make choices. MSXML's whitespace-handling bug makes it not the
premier choice for the kind of work I do (where mixed content abounds),
irrespective of questions of conformance. That the unambiguously conformant
behavior (not throwing away the whitespace unasked), in this case, is also
the right thing to do, just makes it easier for me to choose.
Cheers,
Wendell
======================================================================
Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================