Subject: [xsl] efficiently detecting size of input document? From: Lars Huttar <huttarl@xxxxxxxxx> Date: Wed, 13 Aug 2008 15:48:07 -0500 |
This is a basically solved question... amazing how you find answers when you're composing an email to this mailing list!
Maybe somebody working on a similar problem will find this helpful. And I'd be interested in any suggestions for improvement.
I've been working on the default XSLT stylesheet that Firefox uses to display an XML document to the user. I have something that provides a couple of handy extra features[2], but it's relatively slow. That's no problem for a 5KB XML document, but if you visit a URL and it turns out to be a 10MB document, Firefox basically becomes unresponsive and starts allocating large chunks of memory until it has processed the whole document. The strategies I've thought of for fixing this are either:
(1) detect the doc size in advance, and if it's large, call a template that presents a very no-frills view of the document; or (preferably)
For (1), the difficulty is, I don't know how to efficiently detect the doc size in XSLT. Sure you could count(//*) or even sum string-length(), but by the time you were done counting you would already have lost any time savings you hoped to gain. I would imagine that since the input could be a SAX stream rather than a file, there really is no quick way to predetermine the document size that is guaranteed to be available.
For (2), the challenge would be how to process the first [certain amount] of the document and no more, again without inefficiently counting nodes.
One idea was to use <xsl:number level="any">, something like <xsl:variable name="element-num"><xsl:number count="*" level="any" /></xsl:variable> <xsl:if test="$element-num < $element-limit"> ... <!-- process this node and its descendants -->
Depending on how xsl:number is implemented, this might be efficient; or it might be O(n^2), where n is the element-limit.[1]
The way it's described in the XSLT Programmer's Reference (2nd ed), it sounds like O(n^2): "Starting at the current node, the processor walks backwards through the document in reverse document order, counting the number of nodes that match the count pattern..."
I realize that this description is probably meant to specify the semantics of xsl:number, not its implementation or performance characteristics. So XSLT processors might not implement xsl:number such that each invocation takes time proportional to the number of preceding nodes. But for a statement as general as xsl:number, with arbitrary count and from attributes, how likely is a processor to be smart enough to implement my use case efficiently? It doesn't seem trivial.
OK, I tried it, and it's not too bad! The 10MB document took about 3 sec to render the first 1000 elements. I think that's acceptable!
One might hope that a similar stylesheet could be used for prettyprinting XML efficiently in other environments... but the present problem is solved to my satisfaction.
Regards, Lars
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] Re: XPath Against OpenXML and, Laura Porter | Thread | Re: [xsl] efficiently detecting siz, Andrew Welch |
RE: [xsl] Sorting, but then splitti, Dan Acuff | Date | RE: [xsl] Sorting, but then splitti, Michael Kay |
Month |