Subject: Re: [xsl] XSLT streaming: the processor "remembers" things as it descends the XML tree?|
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxx>
Date: Wed, 20 Nov 2013 19:04:12 -0500
Michael M-H, yes, you are right. Tables are a beast, and if you nest them you will get lots of extra levels "for free". Another example is MathML. I wonder what David C will see with that stylesheet. On the other hand (with apologies) these counter-examples might serve as demonstrations of the principle as much as they disprove it. It depends on what we mean by "pathological" and whether mixing vocabularies, or specialized data structures with heavy semantics (such as table layout or math) necessarily take us into that territory. :-) Michael Sokolov --- this is totally a Balisage topic. I think it has to do with XML's roots as a markup language (something a bit different from a serialization for a data structure). In SGML, you can write <a>1<a>2<a>3 and the processor will provide the end tags for you (assuming you are configured properly to support this, of course). Whether it saw the 'a' elements as serial, or nested, would depend on your content model for 'a'. Cheers, Wendell Wendell Piez | http://www.wendellpiez.com XML | XSLT | electronic publishing Eat Your Vegetables _____oo_________o_o___ooooo____ooooooo_^ On Wed, Nov 20, 2013 at 5:21 PM, Michael Sokolov <msokolov@xxxxxxxxxxxxxxxxxxxxx> wrote: > It's certainly an imaginable situation to have a very deep document, but > very unusual. Is it for purely cultural reasons? For example in some > information-theoretic sense, this document might be said to encode a > sequence of nodes: > > <a>1<a>2<a>3</a></a></a> > > in much the same way as this one: > > <doc><a>1</a><a>2</a><a>3</a></doc> > > just along a different axis (and without the need for the wrapping <doc> > element). > > For some reason nobody seems to do that though. I suspect there are > structural impediments (other than the streaming), or is it purely > convention? > > To address Roger's original question though, you really often *do* want the > ancestor info to be retained while streaming. Consider chunking a large > book in which you'd like to include pointers back to the original document > structure, and metadata from enclosing sections. That was exactly my use > case when I hand-coded streaming processing using SAX, and I retained not > only the ancestors but also /ancestor::*/preceding-sibling::* (to capture > the metadata) > > -Mike > > , but the consequence of *not* rem > > On 11/20/2013 04:00 PM, Dimitre Novatchev wrote: >> >> On Wed, Nov 20, 2013 at 7:25 AM, Wendell Piez <wapiez@xxxxxxxxxxxxxxx> >> wrote: >>> >>> I think it would be very interesting to see a survey of how deep XML >>> documents go in the wild. Except for pathological cases, I think they >>> would rarely go beyond 20 deep. Of course this will vary a great deal >>> by document type. >> >> I think that what Roger points out is useful: probably the concept of >> "streaming" needs to be redefined and something needs to be specified >> about a limit of the "maximum document depth". >> >> Ignoring this leaves us with what actually could serve as the base for >> yet another type of DOS attack. Not only malicious, but just >> accidental -- imagine a buggy program failing to write closing element >> tags ...