[xsl] efficiently detecting size of input document?

Subject: [xsl] efficiently detecting size of input document?
From: Lars Huttar <huttarl@xxxxxxxxx>
Date: Wed, 13 Aug 2008 15:48:07 -0500
Hello,

This is a basically solved question... amazing how you find answers when
you're composing an email to this mailing list!

Maybe somebody working on a similar problem will find this helpful.
And I'd be interested in any suggestions for improvement.

I've been working on the default XSLT stylesheet that Firefox uses to
display an XML document to the user. I have something that provides a
couple of handy extra features[2], but it's relatively slow. That's no
problem for a 5KB XML document, but if you visit a URL and it turns out
to be a 10MB document, Firefox basically becomes unresponsive and starts
allocating large chunks of memory until it has processed the whole
document. The strategies I've thought of for fixing this are either:

(1) detect the doc size in advance, and if it's large, call a template
that presents a very no-frills view of the document; or (preferably)

(2) lay out the first couple of screenfuls of the document nicely, then
display a [More...] button to get the rest. Usually when I'm viewing XML
in a browser I just want to see what it looks like in general, or else the datum I'm looking for is in the first page or two.


For (1), the difficulty is, I don't know how to efficiently detect the
doc size in XSLT. Sure you could count(//*) or even sum string-length(),
but by the time you were done counting you would already have lost any
time savings you hoped to gain. I would imagine that since the input
could be a SAX stream rather than a file, there really is no quick way
to predetermine the document size that is guaranteed to be available.

For (2), the challenge would be how to process the first [certain
amount] of the document and no more, again without inefficiently
counting nodes.

One idea was to use <xsl:number level="any">, something like
   <xsl:variable name="element-num"><xsl:number count="*" level="any"
/></xsl:variable>
   <xsl:if test="$element-num &lt; $element-limit">
     ... <!-- process this node and its descendants -->

Depending on how xsl:number is implemented, this might be efficient; or
it might be O(n^2), where n is the element-limit.[1]

The way it's described in the XSLT Programmer's Reference (2nd ed), it
sounds like O(n^2): "Starting at the current node, the processor walks
backwards through the document in reverse document order, counting the
number of nodes that match the count pattern..."

I realize that this description is probably meant to specify the
semantics of xsl:number, not its implementation or performance
characteristics. So XSLT processors might not implement xsl:number such
that each invocation takes time proportional to the number of preceding
nodes. But for a statement as general as xsl:number, with arbitrary
count and from attributes, how likely is a processor to be smart enough
to implement my use case efficiently? It doesn't seem trivial.

But actually, I don't need the answer to that question. This is a
Firefox-specific application, so for practical purposes, all that really matters here is how fast does Firefox's processor (Trax) run the stylesheet? Well, there's one way to find out.


OK, I tried it, and it's not too bad! The 10MB document took about 3 sec
to render the first 1000 elements.
I think that's acceptable!

One might hope that a similar stylesheet could be used for
prettyprinting XML efficiently in other environments... but the present
problem is solved to my satisfaction.

Regards,
Lars

[1] Or in pathological cases, it could be O(s^2) where s is the size of the input document. I think.

[2] This version gives a broader area to click on to collapse/expand an element, and also provides a button to collapse/expand all children.

Current Thread