Subject: Re: [xsl] [xslt performance for big xml files] From: Aditya Sakhuja <aditya.sakhuja@xxxxxxxxx> Date: Sun, 26 Apr 2009 11:18:04 -0700 |
Thank you very much for the inputs ! As a result, my experiments have shown some encouraging results too. 1> On splitting my 30 MB to 60 splits, I was able to get a massive speed up. Overall transformation happening under 3 min. 2> Did some code level optimization (standard ones, avoiding non essential computations.). Got a massive improvement here too. Tried to avoid call-templates where ever possible. Replaced my loops with apply-templates. eliminated usage of // totally. Lot of performance gain till now. Looking to do some more code level optimization in coming hours. By the way, I am trying to do the split and merge using custom php functions. Is there a more elegant way of doing this ? Thanks, Aditya On Sat, Apr 25, 2009 at 1:50 AM, Michael Kay <mike@xxxxxxxxxxxx> wrote: > > > It's possible to write slow programs in any language, but high-level > declarative languages like XSLT and SQL make it easier! > > I would think there's something in your code that makes the performance > quadratic (or worse) with respect to source document size. This is nothing > to do with XSLT, it's to do with your own code. To test this theory, try > plotting how the stylesheet performance varies as you increase the document > size: 1Mb, 2Mb, 4Mb, 8Mb. > > It's possible that a processor that optimizes joins (Saxon-SA is the only > one I know) would get rid of the quadratic behaviour. On the other hand, it > might not - without seeing your code, all is guesswork. The usual solution > to quadratic behaviour, however, is to optimize your code "by hand" using > keys. > > I would be very surprised if your transformation can't be done in under a > minute by some appropriate tweaking. 30Mb is not big these days. The fact > that you don't have a memory problem means that streaming isn't going to > help. > > You might get a tenfold improvement just by running the same code under a > different processor (or you might not), but you're looking for a factor of > 1000 improvement, and unless you hit lucky with the optimizer, that will > only come from improving your own code. > > Michael Kay > http://www.saxonica.com/ > >> >> I am looking for some tips on performance tuning for my xslt >> which processes a 30 MB file on an B avg. B I am haivng some >> serious issues getting the processing completed under 24 hrs. >> The transformation is completely CPU bound (memory usage is hardly 3- >> 4 %). CPU utilization remains around 99% throughout. >> >> My specific question here is,whether these ideas would help >> reduce processing time: >> >> 1> Splitting the big xml file to multiple files and then feeding it to >> the xsltproc processor. does that sound the right thing, to >> reduce the processing time overall. >> 2> I have done my testing using xsltproc (libxml2). Would Saxon >> processor be an advantage here to use? >> 3> Does xslt processing not fit in for large xml file processing ? >> Should I try looking other stream based processing over this, >> if xslt does not scale ? >> >> I am performing experiments in parallel,but wanted to get in >> feedback from more experienced people with xslt. >> >> Thanks in advance, >> >> -- >> -Aditya > > -- -Aditya
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] [xslt performance for big, Michael Kay | Thread | RE: [xsl] [xslt performance for big, Michael Kay |
Re: [xsl] maintaining sequence numb, fred | Date | Re: [xsl] [xslt performance for big, Florent Georges |
Month |