Nat,
At 01:09 AM 3/6/2009, you wrote:
Thanks a lot for all your suggestions. If I am interpreting what you
all are saying right, white space is handled differently depending on
a few factors. 1) the transformer can follow different rules 2) the
stylesheet can have rules that tell the transformer specifically how
and when to handle white space.
That's essentially correct, with the refinement that in category (1),
all XSLT 1.0 engines should be the same (with one significant
exception), while the differences between XSLT 2.0 engines will be
more significant and require more attention.
For example, you might see in Saxon 9's command line interface that
the "-strip" argument can be used to switch whitespace handling. It
allows three values, 'all', 'none' and 'ignorable'. 'none' is the
proper XSLT 1.0 behavior; 'ignorable' is a 2.0 feature that requires
reference to a schema (a whitespace-only text node child of an
element that only allows element children is necessarily ignorable).
'all' would perhaps not be entirely conformant in 1.0, which says
that whitespace should never be stripped from input unless
xsl:strip-space says to do so. Except this arguably contradicts the
principle that an XSLT engine can accept input that has already been
subject to processing -- which might include whitespace munging --
and at least one widely-installed XSLT 1.0 engine, MSXML, will
ordinarily do this unless you take measures to prevent it. In 2.0,
the rules are made more explicit (at least as I read them :-) that
this is allowable, as well as being clearer that if whitespace-only
nodes (or indeed any input at all) is gone before the XSLT engine
even gets them, there's nothing to be done about that on a general basis.
The reason for this is that whether whitespace-only nodes make it
into the source tree of a transform is really not the job of the
transformation at all, but rather of processes preceding
transformation. While this is commonly initiated by parsing an XML
document (in which ignorable whitespace nodes are ubiquitous), it's
also commonly not (maybe the XML is stored in a database, or
generated dynamically).
A further refinement is that the controls offered in XSLT itself,
xsl:strip-space and xsl:preserve-space (even leaving aside coding
idioms such as select="*" to work around any text node children), are
generally sufficient to the task, at least assuming you are working
to a more or less fixed document type. If this is the case, as long
as you're starting with serialized XML source (and not within a
database or pipeline architecture) the schema support and other
options that come in 2.0 can be viewed as a complication as well as a
convenience; there's nothing wrong with just planning that no
whitespace-only nodes will be stripped (and then perhaps assuring
they're not), and managing it all from within the XSLT. That's a
prudent and workable approach, and requires only that you Know Your
Tree, which is Rule #1 for writing reliable XSLT code in any case.
This is a complicated issue in part because serializers are also
allowed to add cosmetic whitespace ... which then may have to be
recognized as such by downstream processors....
I'm sure that David, Mike, Ken, Tony or someone will weigh in if this
summary isn't entirely accurate. :-)
Cheers,
Wendell
======================================================================
Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================