Subject: GByte Transforms From: Kevin Jones <kjones@xxxxxxxxxxx> Date: Wed, 2 Jun 2004 20:11:41 +0100 |
We have been doing some work on the Sarvega processor to extend the size of the input documents it can cope with. In the latest revision we have managed a transform of a 5.5 GByte input document on standard 32-bit Intel PC hardware using a customer sample transform. This is not an upper limit but enough to break over some 4 GByte boundary issues that we wanted to test. While the memory usage of this type of transform has obviously been well studied and so no longer represents a particular challenge, we are a little more concerned about the performance issues of such large transforms from a stylesheet writers point of view. I was hoping the list might be able to provide some feedback on these issues. First a little background. Transforms of this scale are obviously never going to be measured in the usual ms units but more like minutes to hours. Our current view is that absolute performance is not currently as important as providing the capability in a predictable package. As mentioned memory predictability is not something we really have an issue with (except perhaps if you use keys or similar), but running time is a problem. Any transform that exhibits linear or less running time with respect to the size of its input is generally fine. You can clearly test a transform to see if it exhibits linear behaviour, but this is somewhat less than ideal since real transform performance is often made up of different factors with the mix varying with input document size and structure. The quest then is to find ways of writing stylesheets for these types of transform that give predictable performance results. It would be ideal for a processor to handle this without any fuss but turning an arbitary stylesheet into one that executes in linear time will never be viable for XSLT. Some of our more practical ideas we have been kicking around are: - Ignore the problem, leave to stylesheet writer testing. Extra smarts in the compiler to warn of the use of potentially non-linear behaviour. E.G. Recursive templates not being tail recursive, nested loop/ template constructions. As above but aided by structural information for better targeting. As above but with automatic re-writing where possible. Optional runtime monitoring for non-linear behaviours, perhaps as an addition to profile information. Runtime stop limits, E.G. if predicted execution (as monitored by the runtime) time exceeds a limit terminate. Non-linear algorithm replacement with linear but slower algorithms, applies to both runtime and compiler. Subset XSLT to limit the scope for non-linear transforms. And perhaps the most controversial, don't provide this support for XSLT but only XQuery where predictability should be better. I would be very interested in comments on these options, others you might know of or any other issues that you may have come across when dealing with the performance of very large document transforms. Thanks Kevin Jones Sarvega Inc.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Frustration High (Possibl, Josh Canfield | Thread | Re: [xsl] GByte Transforms, Niclas Hedhman |
Re: [xsl] Frustration High, Niclas Hedhman | Date | Re: [xsl] GByte Transforms, Niclas Hedhman |
Month |