Re: MSXML Whitespace handling

Subject: Re: MSXML Whitespace handling
From: Mike Brown <mike@xxxxxxxx>
Date: Wed, 2 Aug 2000 00:09:59 -0700 (PDT)
Andrew Kimball wrote:
> > the XSLT spec makes it sound like I
> > can expect whitespace characters in my physical document...
> 
> Where does the spec say "physical document"?  The spec uses the term "input
> tree"

I covered this in another post. I am not saying XSLT is not about
tree-to-tree transformation. 

It and the XPath spec do mention in several places that it's not just any
tree they're talking about, it's a tree derived from an XML document. The DOM
complicates things because it lets people make trees that couldn't come from
XML documents. XSLT mentions in at least 2 places that the stylesheet
document is the basis of the stylesheet tree. In my opinion there is not
enough clarity in the specs regarding these issues. 

If your stylesheet document contains whitespace in xsl:text elements and the
intent of having xsl:text as the default whitespace-preserving element is so
that one can in fact preserve whitespace in it, then it is reasonable to
believe that

I also think this is one of SGML/XML's greatest shortcomings, that character
encoding, entities, and the difference between what I like to call the
physical document, the abstract document, and the logical contents of the
document, are very poorly explained and widely misunderstood concepts -- so
much so that people feel like what they are creating in their text editor is
"the document" that is only 1 step removed from the trees that are being
talked about, when in fact it is an encoded entity that is representing some
part of a sequence of certain ISO/IEC 10646-1 coded characters that are in
turn, through markup syntax, representing an abstract, hierarchically related
collection of data. 

This confusion is not at all alleviated when the specs toss around phrases
like "XML document" and "stylesheet document" and say very little about
possible points of confusion such as the interaction between xsl:text and
xml:space.

You're not at all wrong to say that MSXML and MS DOM are doing things that
they are allowed to do, and perhaps even James Clark will come along and say
that XSLT's whitespace preservation rules have nothing to do with xml:text
and that XSLT document authors should know better than to rely on things like
<xsl:text>&#10;</xsl:text> because of this. I'm just saying that the absence
of such clarity on this issue and those I mentioned above has already led to
quite a few stylesheets being authored without consideration of the
possiblity that a parser might come along and muck with some carefully placed
bytes that represent whitespace characters, or that xsl:text wasn't going to
preserve quite as much whitespace as was anticipated by most people's casual
understanding of the forces at work.

> > ...However inaccurate that may be, it would seem to be
> > preferable to preserve whitespace if you know that the DOM will be used as
> > the basis of a stylesheet tree.
> 
> MS DOM does not have this information.  When the user loads the DOM, they do
> not have to declare that it will be used as the basis of a stylesheet tree.
> Why should they have to?  They may use the same DOM as the basis of
> transforms, selections, and custom DOM API tree walks.  They may even
> perform these operations concurrently if the DOM is free-threaded.  They may
> have the same DOM running  on their server for days, peforming hundreds or
> thousands of transforms over its data.
> 
> The point is that the user has control over initial whitespace handling.
> XSLT has control only when the user begins a transform.

I understand and agree with these points. I was thinking of the masses who
are using IE5 to load and transform XML documents without using scripts to
call the tools separately. IE5 should certainly know when a document is going
to be used as the basis of a stylesheet, and it can invoke the parser
appropriately. Sure, it's not technically inappropriate to invoke it without
preservation of whitespace, but I still contend that it is preferable so that
XSLT authors' expectations, however misguided, about xsl:text behavior will
be satisfied.

-Mike


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread