Re: [xsl] Selecting First Direct Sibling

Subject: Re: [xsl] Selecting First Direct Sibling
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Tue, 21 Aug 2007 11:48:43 -0400
Wasiq,

At 10:24 AM 8/21/2007, Hugh wrote:
In my classes - there is typically at least one student who will get surprised by this.

Everything in the input XML will create a node - even whitespace (tabs, line end, LF/CR) unless you take explicit steps to remove the white space as not important to your processing. A text node IS an actual node whose value is the text - even if it is white space.

This is the old markup bugbear, the question of "when is whitespace really whitespace, and when is it only whitespace"?


XML does try in certain contexts to distinguish between "significant" whitespace (that is, space that's actually part of the data content of the document) and "insignificant" whitespace (space that's only there to make the marked-up version more legible, and is not meant to be respected by an application).

Typically this distinction relies on a schema. For example, whitespace inside a paragraph (including what in XSLT constitutes whitespace-only text nodes), should be significant:

<sec>
  <p>In my <term>paragraph</term> <emph>I need my whitespace!</emph></p>
</sec>

... but whitespace between paragraphs (or here, inside the 'sec' but not the 'p') is not. We can tell this from a DTD or schema that declares p elements to have #PCDATA content or the equivalent, whereas presumably the sec does not allow #PCDATA.

But not everyone has a schema, and some folks don't even want one.

Accordingly, the XSLT rule (sometimes observed in the breach by certain well-known vendors) is to save all whitespace unless the stylesheet says to strip it, using the xsl:strip-space and xsl:preserve-space top-level elements.

If you want to be both clean and safe with your whitespace, you'll trust your processor to do this, and name only elements that have only element contents (as specified in a schema to which your input is known to be valid) in xsl:strip-space.

Then, expressions like node() will work the way you want. They'll select whitespace-only nodes, as always (they're nodes too) -- but the ones you don't care about will have been stripped from your source tree.

Cheers,
Wendell

----- Original Message ----- From: "Wasiq Shaikh" <wasiq911@xxxxxxxxxxx>
To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Tuesday, August 21, 2007 9:01 AM
Subject: RE: [xsl] Selecting First Direct Sibling


Oh I see. I had always thought that node() would select an actual node or text. Didn't think it would select blank spaces.


======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

Current Thread