RE: [xsl] [xslt performance for big xml files]

Subject: RE: [xsl] [xslt performance for big xml files]
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Sat, 25 Apr 2009 09:50:09 +0100
It's possible to write slow programs in any language, but high-level
declarative languages like XSLT and SQL make it easier!

I would think there's something in your code that makes the performance
quadratic (or worse) with respect to source document size. This is nothing
to do with XSLT, it's to do with your own code. To test this theory, try
plotting how the stylesheet performance varies as you increase the document
size: 1Mb, 2Mb, 4Mb, 8Mb.

It's possible that a processor that optimizes joins (Saxon-SA is the only
one I know) would get rid of the quadratic behaviour. On the other hand, it
might not - without seeing your code, all is guesswork. The usual solution
to quadratic behaviour, however, is to optimize your code "by hand" using
keys.

I would be very surprised if your transformation can't be done in under a
minute by some appropriate tweaking. 30Mb is not big these days. The fact
that you don't have a memory problem means that streaming isn't going to
help.

You might get a tenfold improvement just by running the same code under a
different processor (or you might not), but you're looking for a factor of
1000 improvement, and unless you hit lucky with the optimizer, that will
only come from improving your own code.

Michael Kay
http://www.saxonica.com/

> 
> I am looking for some tips on performance tuning for my xslt  
> which processes a 30 MB file on an  avg.  I am haivng some 
> serious issues getting the processing completed under 24 hrs.
> The transformation is completely CPU bound (memory usage is hardly 3-
> 4 %). CPU utilization remains around 99% throughout.
> 
> My specific question here is,whether these ideas would help 
> reduce processing time:
> 
> 1> Splitting the big xml file to multiple files and then feeding it to
> the xsltproc processor. does that sound the right thing, to 
> reduce the processing time overall.
> 2> I have done my testing using xsltproc (libxml2). Would Saxon
> processor be an advantage here to use?
> 3> Does xslt processing not fit in for large xml file processing ?
> Should I try looking other stream based processing over this, 
> if xslt does not scale ?
> 
> I am performing experiments in parallel,but wanted to get in 
> feedback from more experienced people with xslt.
> 
> Thanks in advance,
> 
> --
> -Aditya

Current Thread