Re: [xsl] [xslt performance for big xml files]

Subject: Re: [xsl] [xslt performance for big xml files]
From: Aditya Sakhuja <aditya.sakhuja@xxxxxxxxx>
Date: Sun, 26 Apr 2009 11:18:04 -0700
Thank you very much for the inputs ! As a result, my experiments have
shown some encouraging results too.

1> On splitting my 30 MB to 60 splits, I was able to get a massive
speed up. Overall transformation happening under 3 min.
2> Did some code level optimization (standard ones, avoiding non
essential computations.). Got a massive improvement here too. Tried to
avoid call-templates where ever possible. Replaced my loops with
apply-templates. eliminated usage of // totally.

Lot of performance gain till now. Looking to do some more code level
optimization in coming hours.

By the way, I am trying to do the split and merge using custom php
functions. Is there a more elegant way of doing this ?

Thanks,
Aditya

On Sat, Apr 25, 2009 at 1:50 AM, Michael Kay <mike@xxxxxxxxxxxx> wrote:
>
>
> It's possible to write slow programs in any language, but high-level
> declarative languages like XSLT and SQL make it easier!
>
> I would think there's something in your code that makes the performance
> quadratic (or worse) with respect to source document size. This is nothing
> to do with XSLT, it's to do with your own code. To test this theory, try
> plotting how the stylesheet performance varies as you increase the document
> size: 1Mb, 2Mb, 4Mb, 8Mb.
>
> It's possible that a processor that optimizes joins (Saxon-SA is the only
> one I know) would get rid of the quadratic behaviour. On the other hand, it
> might not - without seeing your code, all is guesswork. The usual solution
> to quadratic behaviour, however, is to optimize your code "by hand" using
> keys.
>
> I would be very surprised if your transformation can't be done in under a
> minute by some appropriate tweaking. 30Mb is not big these days. The fact
> that you don't have a memory problem means that streaming isn't going to
> help.
>
> You might get a tenfold improvement just by running the same code under a
> different processor (or you might not), but you're looking for a factor of
> 1000 improvement, and unless you hit lucky with the optimizer, that will
> only come from improving your own code.
>
> Michael Kay
> http://www.saxonica.com/
>
>>
>> I am looking for some tips on performance tuning for my xslt
>> which processes a 30 MB file on an B avg. B I am haivng some
>> serious issues getting the processing completed under 24 hrs.
>> The transformation is completely CPU bound (memory usage is hardly 3-
>> 4 %). CPU utilization remains around 99% throughout.
>>
>> My specific question here is,whether these ideas would help
>> reduce processing time:
>>
>> 1> Splitting the big xml file to multiple files and then feeding it to
>> the xsltproc processor. does that sound the right thing, to
>> reduce the processing time overall.
>> 2> I have done my testing using xsltproc (libxml2). Would Saxon
>> processor be an advantage here to use?
>> 3> Does xslt processing not fit in for large xml file processing ?
>> Should I try looking other stream based processing over this,
>> if xslt does not scale ?
>>
>> I am performing experiments in parallel,but wanted to get in
>> feedback from more experienced people with xslt.
>>
>> Thanks in advance,
>>
>> --
>> -Aditya
>
>



--
-Aditya

Current Thread