Re: [xsl] Trimming (formatting-only) leading tabs/spaces from XSLT - issues?

Subject: Re: [xsl] Trimming (formatting-only) leading tabs/spaces from XSLT - issues?
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Tue, 07 Jun 2011 14:46:21 -0400
Dear Phil,

On 6/7/2011 1:42 PM, Philip Fearon wrote:
In XML (and SGML, including SGML-conformant HTML), I think this means you
can follow a schema -- significant whitespace is anywhere character data is
permitted. Regrettably, this means that all whitespace (outside tags) is
significant when there is no schema. (Whether you can take a schema to be
implicit when it is not given is another problem.)

Whether XML (or HTML) fragments embedded in XSLT can be taken to reference a
schema depends, I'm afraid, on the XSLT: it won't always be true.
Conservatively, we might say it's never definitively true except when a
schema is specifically assigned using xsl:import-schema and
xsl:result-document/@validation='strict'. But I suppose an application might
also let a user declare such a binding by other means.

Yes, schema-binding within the XSLT is critical to the editor's
features and the rest of the system, and is performed externally,
mainly because this (hypothetical product) is a batch-processing
system, with an editor for convenience. I've found issues in the 'non
schema-aware' scenario, because literal-result elements, don't have
full context (though there may be some ancestor element to provide an
interrupted validation 'path' that is exploited if available) and can
be ambiguous unless defined globally in the schema. This (incomplete)
PSVI data is currently only used for partial validation of
literal-result elements/attributes and auto-completion, but it makes
sense to exploit this for trimming purposes, when available.

This sounds reasonable to me. Or, it does not sound unreasonable, which may be almost the same thing.

Finally, I think it's important to distinguish between whitespace handling
in tag-formatting applications from the way whitespace may, or may not, be
collapsed, re-flowed or munged for display in a receiving application. These
are two different issues that are frequently confused. The fact that some
tag-formatting applications may (usefully) reformat whitespace in some
places where it is not entirely stripped -- perhaps on the grounds that
receiving applications will be doing likewise, so it doesn't matter -- makes
for another set of troublesome blurry cases.

I think I follow you (though I may have read this wrong!). So in this case, the tag-formatting is the stripping of whitespace and the receiving application is the editor's display and the hardcopy-rendering system for printing (both of which use line-by-line margins, not characters to auto-indent - which is why such characters must be removed first).

I think you're reading the distinction correctly.

A developer can then, by changing the view (in
the receiving application), choose a preferred indentation style or
select none at all without affecting a single character. Hopefully
there isn't a blurry case once you can say that all characters in the
XSLT are there on merit (though some will still fulfill a text
formatting role, say for XPath), not because an XML 'pretty-print'
system needs them (I hope this doesn't sound emotional - its not meant

Fortunately, the XSLT specs are clear on this point: whitespace-only text nodes in a stylesheet have no semantics that an XSLT processor is bound to respect, outside xsl:text, while it is bound to respect any whitespace mixed with non-whitespace text.

Because whitespace-only text nodes outside xsl:text can have no significance in a stylesheet as processed, the law of parsimony argues that an editing application might similarly be allowed to do with them what it will (or do away with them as the case may be).


Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.      
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
  Mulberry Technologies: A Consultancy Specializing in SGML and XML

Current Thread