Subject: xsl:text whitespace preservation vs xml:space (was: Re: microsoft latest) From: Mike Brown <mike@xxxxxxxx> Date: Mon, 31 Jul 2000 15:43:46 -0700 (PDT) |
Andrew Kimball wrote: > The spec explicitly does not specify that the XSLT processor must > construct the tree. True, implementation is not specified; but the result must be *as if* it has been processed according to the data model and template processing model prescribed by XPath and XSLT. I would not construe this as leeway for text children of xsl:text elements to be mangled when the spec says they're to be handled a certain way, w.r.t. whitespace. > Furthermore, the input tree may have gone through any number of > transformations before reaching the XSLT processor. In the MSXML case, the > tree is actually constructed by the MS DOM implementation, in accordance > with XML 1.0 rules. These rules state that in the absence of any in-scope > xml:space="preserve" elements, the whitespace processing rules are > application dependent. To the extent that the DOM is the application, that is correct and not at all unreasonable. > Therefore, it is conformant for the DOM to strip > this whitespace on load and construction of the tree. It is also valid to > allow users to transform the tree (including stripping any whitespace they > please) using the DOM API before sending it along to the XSL processor. I agree with you here, and I think I see the real problem. XSLT and XPath are designed to work with the logical contents of XML documents, using one particular node tree model we all love & hate. There are a couple of ways for an XSLT processor to determine what the logical contents of XML documents (including the stylesheets) are. You can get SAX events from a parser as it looks at the physical entities that comprise a document or you can get a DOM object that was created perhaps by parsing or perhaps "by hand". The way those representations are implemented might be acted upon directly during XSLT processing, or they might be converted to some other representation that saves memory and/or is easier to process. The specs don't dictate these implementations, only expected behaviors. To be a conforming XSLT/XPath application of XML, your application has to behave as if it has implemented the node tree and template processing model that is specified. A problem arises in the fact that the specs, by not dictating that the trees have to be structured a certain way (i.e., as if they were based on well-formed XML documents), there is room for interpretation on things like these whitespace issues. The DOM may have some redeeming values, but it does leave open the nasty possibility of having created a tree that contains things that can't be in a well-formed XML document, either by the tree's structure (e.g. multiple document elements) or by its contents (e.g., illegal characters). And then there is this situation where the XML parser that created the DOM version (or even the SAX event stream version) of a logical XML document might have rightly exercised its option to ignore whitespace where it didn't find xml:space="preserve". So I believe that there is an oversight in the XSLT spec with regard to xsl:text and whitespace preservation. Given that whitespace may be stripped or otherwise mangled before the tree is created, the usefulness of preserving whitespace in xsl:text is pretty limited. It means that things like <xsl:text> </xsl:text> <xsl:text>	 </xsl:text> cannot be relied upon and should be deprecated in favor of <xsl:value-of select="	 "/> I am speculating here, but it looks like this part of the XSLT spec may have been written under the assumption that not only would the stylesheet always be a well-formed XML document ("A transformation in the XSLT language is expressed as a well-formed XML document" and "Normally an XSLT stylesheet is a complete XML document") but that this document, when used as the basis for the stylesheet node tree, would be parsed such that whitespace specified in the document would be preserved, at least in xsl:text elements. If that's not the case, then I'm surprised this hasn't come up before, and I think it warrants some clarification, especially since the gurus on this list have been known to advocate using xsl:text to insert whitespace in the result tree. Also, there are other loopholes in the DOM. DOM implementations let you throw any characters you want into the tree, not just XML-allowed characters. They also let you haphazardly construct trees that are different than what you could get by parsing an XML document. XSLT and XPath do very little to acknowledge or prescribe how these situations should be handled, especially for the stylesheet tree. -Mike XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
MSXML Perf Stats, Steven Livingstone | Thread | Re: microsoft latest, bug with exte, Paul Tchistopolskii |
Re: Can input xml and stylesheet be, Dimitre Novatchev | Date | RE: Sort supported processors ???, Paulo Gaspar |
Month |