Subject: Re: [xsl] Managing debug logging in complex transforms: what do people do? From: Graydon <graydon@xxxxxxxxx> Date: Wed, 26 Mar 2014 15:40:54 -0400 |
On Wed, Mar 26, 2014 at 02:56:08PM -0400, Liam R E Quin scripsit: > On Mon, 2014-03-24 at 18:02 -0400, Graydon wrote: > [...] > > Single digit integer minutes, quite often, outside debug mode. > > I remember a client with a transformation that was taking I think 20 > minutes on a relatively small file; in the end I preprocessed the style > sheet to add a trace message at the start of each template, and ran that > through a simple program (I might'v written a Perl script to do it) that > timestamped each line of the trace. We used Saxon's -T "trace" mode and processed the output to identify miscreant templates, which were then optimized. Some of them were pretty seriously miscreant, too, it's difficult to re-group mixed content efficiently when you're moving parts of it somewhere else, plus a lot of external lookup. I found out that the "load file as a big variable and XPath vs load file as a big variable and keys" tip-over point in favour of keys for the lookup was around a thousand things, too. The original run time, on Cygwin instead of real POSIX, could be over 8 hours. It got down to ~10 minutes for that set of input with various process changes, along with improving the problematic templates. These were often large, complex files -- legislative acts and regulations, which vary from "don't fish there on Thursdays" to the entire Income Tax Act -- where we had to do a lot of restructuring and there were something like twelve or fourteen full passes through the data, with most of the passes involved in properly building the legislative citations for anything with a number. The single-digit-integer-minutes run times were pretty good for the input. I think the combination of twelve passes and a 60 MB input file (they were by no means all that big but inevitably the larger ones had weirder problems) was working out to 12 x 5 x 60 = 3.6 GB of parsed XML plus non-trivial debugging overhead for the Java heap, and crushing the debugger under swapping load. The non-debugging processing looked like it was a lot more able to throw intermediate results away when it's done with them, and small input debugged fine and smaller input would debug, it was only the really big stuff that was hopeless. [snip] > On the other hand I don't recommend optimizing something that doesn't > yet work, as the more work you put into it, the less you'll be willing > to rewrite altogether when you find parts that are wrong :-) and it's a > waste of your time, of course. "Premature optimization is the root of all evil." Complete agreement from me! -- Graydon
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Managing debug logging in, Liam R E Quin | Thread | Re: [xsl] Managing debug logging in, Alex Muir |
Re: [xsl] Managing debug logging in, Liam R E Quin | Date | [xsl] The trial cards for the XSLT , Dimitre Novatchev |
Month |