Re: [xsl] Trimming (formatting-only) leading tabs/spaces from XSLT - issues?

Subject: Re: [xsl] Trimming (formatting-only) leading tabs/spaces from XSLT - issues?
From: Philip Fearon <pgfearo@xxxxxxxxxxxxxx>
Date: Tue, 7 Jun 2011 18:42:38 +0100
On Tue, Jun 7, 2011 at 4:59 PM, Wendell Piez <wapiez@xxxxxxxxxxxxxxxx> wrote:
> Phil,
> You are entirely correct that this is an emotional issue. Paradoxically,
> people get most upset when what seems like "correct" and "common sense" to
> them is ignored or defied in favor of some other method whose logic is
> obscure. (For whatever reason, it seems people have less patience with
> unknowns they believe to be knowable and controvertible than they do with
> the apparently unknowable and incontrovertible. And machines are supposed to
> be knowable.)

Understanding this, I will try to tread carefully...
> So the first rule is, make it possible to turn off the behavior, and if
> other features depend on it, make those dependencies very clear.

Yes, though trimming is kind of the opposite to 'pretty-print' it has
similar characteristics, so I will look at existing practices for
warning users and provide progressive behavior that can be undone
> On the issue of whitespace in XML, this is one of the most vexing areas,
> largely because many people don't know what the rules are -- but (and, or)
> do have their own notions of what's right. The rules you enumerate are a
> start, except where they blur (such as #3 -- in the XSLT namespace, for
> example, the 'text' element is sacrosanct, but in the TEI namespace it
> follows rule #1). Trouble will start with the blurry cases if it hasn't
> already.
> Accordingly, I think the second rule is to be very conservative.

I will try.
> In XML (and SGML, including SGML-conformant HTML), I think this means you
> can follow a schema -- significant whitespace is anywhere character data is
> permitted. Regrettably, this means that all whitespace (outside tags) is
> significant when there is no schema. (Whether you can take a schema to be
> implicit when it is not given is another problem.)
> Whether XML (or HTML) fragments embedded in XSLT can be taken to reference a
> schema depends, I'm afraid, on the XSLT: it won't always be true.
> Conservatively, we might say it's never definitively true except when a
> schema is specifically assigned using xsl:import-schema and
> xsl:result-document/@validation='strict'. But I suppose an application might
> also let a user declare such a binding by other means.
Yes, schema-binding within the XSLT is critical to the editor's
features and the rest of the system, and is performed externally,
mainly because this (hypothetical product) is a batch-processing
system, with an editor for convenience. I've found issues in the 'non
schema-aware' scenario, because literal-result elements, don't have
full context (though there may be some ancestor element to provide an
interrupted validation 'path' that is exploited if available) and can
be ambiguous unless defined globally in the schema. This (incomplete)
PSVI data is currently only used for partial validation of
literal-result elements/attributes and auto-completion, but it makes
sense to exploit this for trimming purposes, when available.

> Finally, I think it's important to distinguish between whitespace handling
> in tag-formatting applications from the way whitespace may, or may not, be
> collapsed, re-flowed or munged for display in a receiving application. These
> are two different issues that are frequently confused. The fact that some
> tag-formatting applications may (usefully) reformat whitespace in some
> places where it is not entirely stripped -- perhaps on the grounds that
> receiving applications will be doing likewise, so it doesn't matter -- makes
> for another set of troublesome blurry cases.

I think I follow you (though I may have read this wrong!). So in this
case, the tag-formatting is the stripping of whitespace and the
receiving application is the editor's display and the
hardcopy-rendering system for printing (both of which use line-by-line
margins, not characters to auto-indent - which is why such characters
must be removed first). A developer can then, by changing the view (in
the receiving application), choose a preferred indentation style or
select none at all without affecting a single character. Hopefully
there isn't a blurry case once you can say that all characters in the
XSLT are there on merit (though some will still fulfill a text
formatting role, say for XPath), not because an XML 'pretty-print'
system needs them (I hope this doesn't sound emotional - its not meant


Current Thread