RE: MSXML Whitespace handling

Subject: RE: MSXML Whitespace handling
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Wed, 02 Aug 2000 01:04:17 +0100
At 13:51 01/08/00 -0700, Andrew Kimball wrote:
>As for mangling by default, that is a beef with the design of the MS DOM,
>not with the conformance of MS XSL.  The MS DOM defaults towards performance
>and low memory consumption, while still staying within the XML 1.0 spec.  I
>think it was the right decision for the vast majority of users.  Users who
>need to preserve whitespace can always set preserveWhiteSpace=true when
>loading the DOM, or use xml:space="preserve" to tag significant whitespace.

As Andy says, it is a beef with the design of the MS DOM rather than MS XSL.

>From a standards point of view, it all comes down to whether MS DOM is
counted as an XML processor or an XML application.  The XML Recommendation
states:

"A software module called an XML processor is used to read XML documents
and provide access to their content and structure. It is assumed that an
XML processor is doing its work on behalf of another module, called the
application. This specification describes the required behavior of an XML
processor in terms of how it must read XML data and the information it must
provide to the application."

Andy said:

"The application responsible for parsing the input XML and
building the tree cache is the DOM, not XSLT.  Therefore, it is perfectly
reasonable to view the DOM as the "application" referred to in the XML 1.0
spec."

It seems the job of MS DOM is to read in (parse) and provide access to the
content and structure of the XML document: squarely in the preserve of the
'XML processor' rather than the 'XML application'.  (If that's not the
case, how does MS DOM *apply* the information in the XML document as a
standalone application?)  It seems to me that it is MS XSL that actually
performs some action as a result of the XML: MS XSL is an XML application,
MS DOM is an XML processor.

In the section on Whitespace Processing (2.10) the XML Recommendation states:

"An XML processor must always pass all characters in a document that are
not markup through to the application. A validating XML processor must also
inform the application which of these characters constitute white space
appearing in element content."

Given that MS DOM is an XML processor, it should be passing the whitespace
within xsl:text through to MS XSL so that it can deal with it properly.

>From a usability point of view, in my experience one of the main uses of
xsl:text is to add whitespace in some output.  I'm sure that it makes MS
DOM quicker and leaner not to worry about whitespace, but it seriously
detracts from its utility as a XML processor to be used by an XSLT
Processor like MS XSL.

If there was a normative XSLT DTD, and the XSLT DTD specified:

<!ATTLIST xsl:text
  xml:space	(preserve)	#FIXED	'preserve'>

then presumably MS DOM would preserve the whitespace within xsl:text.

As it is, the DTD that is supplied within the XSLT Recommendation is
non-normative and I imagine that most XSLT processors decide what to do on
the basis of an implicit understanding of the intention behind the
definitions given within the XSLT Recommendation rather than relying on an
explicit DTD.  It is clearly the intention within
[http://www.w3.org/TR/xslt#strip] that xsl:text should preserve whitespace;
XML applications that deal with XSLT should treat these elements as if they
had xsl:space="preserve" declared on them.  As a compromise, could MS DOM
treat xsl:text as if xml:space="preserve" were defined on it?

Perhaps unfortunately, because it would be nice if a small compromise were
all that's needed, the rules governing whether whitespace is significant
within XSL elements is more complex that whether an element has
xml:space="preserve" or even whether it's an xsl:text element.  In XSLT,
you can define elements within which whitespace should be preserved using
xsl:preserve-space (in combination with xsl:strip-space).  If MS XSL is not
given sufficient information to process these elements according to the
XSLT Recommendation, then these elements are useless when used with it.  A
larger compromise would involve MS DOM treating all mixed-content and
#PCDATA XSLT elements as if xml:space="preserve" were defined on them.

However, for true compliance as a XML processor, to avoid spurious
exceptions for XSLT elements, and to enable MS XSL (and, eventually, other
XML applications) to perform in a useful and compliant manner, MS DOM
should preserve whitespace by default.  If MS DOM does not, MS XSL should
use a conformant XML processor instead, to enable it to conform to the XSLT
Recommendation.

My 10p worth :)

Cheers,

Jeni

Dr Jeni Tennison
Epistemics Ltd * Strelley Hall * Nottingham * NG8 6PE
tel: 0115 906 1301 * fax: 0115 906 1304 * email: jeni.tennison@xxxxxxxxxxxxxxxx


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread