[xsl] Ignorable whitespace and XSLT

Subject: [xsl] Ignorable whitespace and XSLT
From: "Christian Roth" <roth@xxxxxxxxxxxxxx>
Date: Mon, 21 Feb 2005 15:48:05 +0100
Browsing the "Result still indented despite indent="no"" thread and
stumbling just today about that very msxml white space stripping issue, I
wondered about the following test case involving ignorable whitespace.

I consider "ignorable whitespace" whitespace that can safely be ignored
as pretty-printing artefacts by reading the XML document's associated DTD
and verifying that the element at hand has not a mixed content model,
i.e. it may not contain PCDATA content.

I also assume that an underlying XML parser will build a DOM (or deliver
SAX events) that correctly identifies ignorable whitespace elements
(which it can when reading the associated DTD).

Here are my test files:

<!ELEMENT root (leaf)*>

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE root SYSTEM "file:///Users/chris/Desktop/igwstest/simple.dtd">

<?xml version='1.0' encoding='iso-8859-1'?>
<xsl:stylesheet version='1.0' xmlns:xsl='http://www.w3.org/1999/XSL/
<xsl:output method='xml' encoding='iso-8859-1' indent='no'/>

<xsl:template match="root">
<output><xsl:value-of select="child::node()[1]"/></output>

The result is using Saxon 8.1 with an underlying Xerces parser:

<?xml version="1.0" encoding="iso-8859-1"?><output>

Note that the ignorable whitespace node between the opening tag <root>
and the opening tag <leaf> is considered a significant node in XSLT
processing - it is still there.

Is this intended behaviour, and if so, why?

It makes sense to me to be able to declare elements with <xsl:strip-
space> for those cases where there is no DTD on the source XML document
and therefore no way for the parser to detect an ignorable whitespace
node. In this case, the XSLT author can then use this meachnism to
provide that info to XSLT processor "manually".

But why is there the need to specify this explicitly for XML documents
that come with a DTD?

Regards, Christian.

Current Thread