Re: [xsl] normalize-space() except ...

Subject: Re: [xsl] normalize-space() except ...
From: "Wendell Piez wapiez@xxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 11 Mar 2015 14:33:29 -0000

On Tue, Mar 10, 2015 at 5:40 PM, Flynn, Peter pflynn@xxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> I do almost exactly this in several applications. I think it's fairly
> common.
>>     watch for
>>     <p>The man wore<i> black </i>socks</p>
>>     which is not unlikely in XML made from word processing software.
> Slightly more common would be <p>The man wore <i>black </i>socks</p>
> where a double-click highlight in the WP software included the trailing
> space on the word (someone just told me Word has just stopped doing
> this: can anyone confirm?).
> More pernicious is the erroneous elision of white-space-only nodes in
> mixed content:
> <p>The man wore <b>black socks<b> <i>only</i> on Tuesdays.</p>
> resulting in The man wore black socksonly on Tuesdays. due to a faulty
> xsl:strip-space (white-space-only nodes between subelements in mixed
> content should probably never be removed, which is sometimes hard to
> explain to people unaccustomed to document-class XML).


Usefully, current versions of Saxon offer the option of refererring to
a DTD or schema to determine where stripping of whitespace-only text
nodes is safe (i.e., not in mixed content). But this is on the
boundaries of XSLT (which doesn't say much about how inputs may be
pre-processed), and not standardized AFAIK.

For many projects, having an XSLT that does nothing but normalize
whitespace can be useful. Such an XSLT needs to make distinctions
between three types of elements: those that contain elements only;
those that contain text-only or mixed content; and those such as HTML
'pre' where all whitespace is significant (not only as "white space"),
or descendants of those. (That is, in HTML, pre/b works differently
from p/b.) However, it's a different order of problem to generalize
this transformation across document types; the logic will be different
based on your authority for these these distinctions (whether schema
or data set), as well as what you actually consider to be "pretty" in
the result.

Cheers, Wendell

Wendell Piez |
XML | XSLT | electronic publishing
Eat Your Vegetables

Current Thread