xsl:text whitespace preservation vs xml:space (was: Re: microsoft latest)

Subject: xsl:text whitespace preservation vs xml:space (was: Re: microsoft latest)
From: Mike Brown <mike@xxxxxxxx>
Date: Mon, 31 Jul 2000 15:43:46 -0700 (PDT)
Andrew Kimball wrote:
> The spec explicitly does not specify that the XSLT processor must
> construct the tree.

True, implementation is not specified; but the result must be *as if*
it has been processed according to the data model and template processing
model prescribed by XPath and XSLT. I would not construe this as leeway
for text children of xsl:text elements to be mangled when the spec says
they're to be handled a certain way, w.r.t. whitespace.

> Furthermore, the input tree may have gone through any number of
> transformations before reaching the XSLT processor.  In the MSXML case, the
> tree is actually constructed by the MS DOM implementation, in accordance
> with XML 1.0 rules.  These rules state that in the absence of any in-scope
> xml:space="preserve" elements, the whitespace processing rules are
> application dependent.

To the extent that the DOM is the application, that is correct and not
at all unreasonable.

> Therefore, it is conformant for the DOM to strip
> this whitespace on load and construction of the tree.  It is also valid to
> allow users to transform the tree (including stripping any whitespace they
> please) using the DOM API before sending it along to the XSL processor.

I agree with you here, and I think I see the real problem.

XSLT and XPath are designed to work with the logical contents of XML
documents, using one particular node tree model we all love & hate.

There are a couple of ways for an XSLT processor to determine what the
logical contents of XML documents (including the stylesheets) are. You can
get SAX events from a parser as it looks at the physical entities that
comprise a document or you can get a DOM object that was created perhaps by
parsing or perhaps "by hand".

The way those representations are implemented might be acted upon directly
during XSLT processing, or they might be converted to some other
representation that saves memory and/or is easier to process. The specs don't
dictate these implementations, only expected behaviors. To be a conforming
XSLT/XPath application of XML, your application has to behave as if it has
implemented the node tree and template processing model that is specified.

A problem arises in the fact that the specs, by not dictating that the trees
have to be structured a certain way (i.e., as if they were based on
well-formed XML documents), there is room for interpretation on things like
these whitespace issues.

The DOM may have some redeeming values, but it does leave open the nasty
possibility of having created a tree that contains things that can't be in a
well-formed XML document, either by the tree's structure (e.g. multiple
document elements) or by its contents (e.g., illegal characters).

And then there is this situation where the XML parser that created the DOM
version (or even the SAX event stream version) of a logical XML document
might have rightly exercised its option to ignore whitespace where it didn't
find xml:space="preserve".

So I believe that there is an oversight in the XSLT spec with regard to
xsl:text and whitespace preservation. Given that whitespace may be stripped
or otherwise mangled before the tree is created, the usefulness of preserving
whitespace in xsl:text is pretty limited. It means that things like

<xsl:text>	
</xsl:text>

<xsl:text>&#9;&#10;</xsl:text>

cannot be relied upon and should be deprecated in favor of <xsl:value-of
select="&#9;&#10;"/>

I am speculating here, but it looks like this part of the XSLT spec may have
been written under the assumption that not only would the stylesheet always
be a well-formed XML document ("A transformation in the XSLT language is
expressed as a well-formed XML document" and "Normally an XSLT stylesheet is
a complete XML document") but that this document, when used as the basis for
the stylesheet node tree, would be parsed such that whitespace specified in
the document would be preserved, at least in xsl:text elements.

If that's not the case, then I'm surprised this hasn't come up before, and I
think it warrants some clarification, especially since the gurus on this list
have been known to advocate using xsl:text to insert whitespace in the result
tree.

Also, there are other loopholes in the DOM. DOM implementations let you throw
any characters you want into the tree, not just XML-allowed characters. They
also let you haphazardly construct trees that are different than what you
could get by parsing an XML document. XSLT and XPath do very little to
acknowledge or prescribe how these situations should be handled, especially
for the stylesheet tree.

-Mike


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread