Re: [xsl] XSLT streaming: the processor "remembers" things as it descends the XML tree?

Subject: Re: [xsl] XSLT streaming: the processor "remembers" things as it descends the XML tree?
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxx>
Date: Fri, 22 Nov 2013 10:25:27 -0500
Dear Roger,

I am not a computer scientist, so I am talking through my hat.

But there are real experts on this list, so I am confident that
whatever I say wrong will be corrected. :-)

On Fri, Nov 22, 2013 at 5:00 AM, Costello, Roger L. <costello@xxxxxxxxx>
> I think the discussion is zeroing in on the keys to
XML-design-for-streaming. That's exciting!
> Wendell, you made these two fantastic statements:
>         (1) Streaming is useful when branches of
>         the document can be processed without
>         reference to other branches.
>         (2) The smaller your branches (each of
>         which offers a discrete processing context),
>         the more you benefit.
> I understand (1). But I am not clear how (2) logically follows from (1).
Would you explain further please?

It's pretty simple. The main benefit of streaming is that it optimizes
memory management in advance of processing, by indicating what the
machine needs to "keep in mind" (i.e., the scope of the information
set it has to accommodate) statically.

Smaller branches = less scope = more savings on memory.

On the other hand, I think the difference between a branch of 100
nodes vs a branch of 50 nodes is pretty negligible compared to a
branch of thousands of nodes (i.e. a whole document, big chunks of
documents, or aggregations of documents).

Also, processing big branches (documents or collections) together
without streaming is *not* a bad thing to do in itself. It's all about
the tradeoff of available resources, including knowhow and time to

> You are saying, "Flatter is better" when it comes to
XML-design-for-streaming, right? Why? I sense that may be right, but I can't
articulate why, I am hoping you can.

Actually I'm not saying that. I don't think 50 nodes deep is likely to
be all that bad, but except for in the laboratory, that's probably
deeper than you'd ever go.

Moreover, philosophically I'm somewhat opposed to designing XML for
streaming or for any particular process or processing paradigm. This
is sometimes necessary, but more often I think it would be optimizing
in the wrong place (which is to say, not really optimizing at all).

Indeed, the entire beauty and power of XML is that it doesn't have to
be flat. We can already do flat. But the world is not flat --
sometimes not even when it looks flat.

Instead, I think the complexity of the data design should reflect the
complexity of the information. In designing XML, the judicious use of
wrappers even when they're formally redundant is one of the best
things you can do to ease the problem of managing complexity.

HTML, for example, is pretty darn flat, but at least it has ul/li not
just li. One might argue that we should infer the existence of lists
from contiguous 'li' elements. What a headache that would be. (And
there are many systems like this.)

> What certainly seems to be true is this, "Design XML such that processing
its nodes does not require knowledge of any preceding nodes" (Ken, excellent
observation about preceding axis). But what does that mean, practically? What
practical guidelines could be written to achieve that design goal?

I think this is way too general. Streaming is useful, but it's not in
itself a good for which other goods should be sacrificed.

Would you be comfortable if someone said you can't have
cross-references in your user manual because they prevent you from
streaming in your production process?

I'd rather have the cross-references, and then if I need to stream, I
can figure out how to do that. (It's no problem if I can run two
passes. If it's not, I start to ask who's in charge.)

Cheers, Wendell

Wendell Piez |
XML | XSLT | electronic publishing
Eat Your Vegetables

Current Thread