RE: [xsl] optimization for very large, flat documents

Kevin,

I don't know if this would be of help to you, but I was having severe timing
issues too and I was able to cut my processing time dramatically.  

My original test file was only 400 KB and took about 50 seconds to process.
I went on to my next group which was a 12 MB file and took 8 minutes to
process!!! Memory usage varied from 100-300 megs.

I was doing a lot of these kinds of for-each select statements:

select="../Table[(CLM_CG_CUST_CODE = current()/CLM_CG_CUST_CODE) and (CLM_NO
= current()/CLM_NO) and (not(CTD_SYS_ID = preceding::Table/CTD_SYS_ID))]"

Basically asking it to compare the current node to every other node in the
source tree for each level (this example is only level 2 of 4)

This has a net result of increasing time exponentially (more time is needed
per record as you add more records).

I learned how to use the for-each-group statement and now my grouping
statements look like this:

<xsl:for-each-group select="current-group()" group-by="CTD_SYS_ID">

And now I get the same results but processing takes 6 seconds!!!


Just another XSLT success story ;)

--Jim Neff





-----Original Message-----
From: Kevin Rodgers [mailto:kevin.rodgers@xxxxxxx] 
Sent: Thursday, January 20, 2005 1:09 PM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [xsl] optimization for very large, flat documents

Thanks to everyone who responded.  For now I plan to follow Pieter's idea of
chunking the data into manageable pieces (16-64 MB).  Then I'm going to look
into Michael's suggestions about STX (unfortunately, not yet a W3C
recommendation and thus not widely implemented) and XQuery.

For anyone interested in some numbers, I've split each of my 2 large files
(613 MB and 656 MB) into subfiles of 16 K independent entries (which vary in
size), yielding sets of 25 and 37 subfiles (of approx. 25 MB and 17 MB each,
respectively).  I process them by running Saxon 8.2 from the command line
(with an -Xmx value of 8 times the file size) on a Sun UltraSPARC with 2 GB
of real memory.  The set of 37 17 MB XML subfiles are processed with a
slightly simpler stylesheet, and take about 1:15 (minutes:seconds) each; the
set of 25 25 MB XML subfiles use
1 document() call per entry to/from a servlet on a different host and take
about 8 minutes each.

My next step is to use Saxon's profiling features to find out where I can
improve my stylesheet's performance.

Thanks again to everyone on xsl-list for all your help!
--
Kevin Rodgers

Current Thread
RE: [xsl] optimization for very large, flat documents, (continued) Michael Kay - Wed, 19 Jan 2005 09:16:56 -0000 Dimitre Novatchev - Wed, 19 Jan 2005 21:19:20 +1100 David P. Nesbitt - Wed, 19 Jan 2005 16:51:34 -0800 (PST) Kevin Rodgers - Thu, 20 Jan 2005 11:09:15 -0700 Jim Neff - Thu, 20 Jan 2005 13:24:32 -0500 <= Pieter Reint Siegers Kort - Tue, 18 Jan 2005 18:30:12 -0600 Pawson, David - Wed, 19 Jan 2005 09:59:41 -0000 Michael Kay - Wed, 19 Jan 2005 10:47:21 -0000 Pieter Reint Siegers Kort - Thu, 20 Jan 2005 12:20:18 -0600

Current Thread

RE: [xsl] optimization for very large, flat documents, (continued)
- Michael Kay - Wed, 19 Jan 2005 09:16:56 -0000
  - Dimitre Novatchev - Wed, 19 Jan 2005 21:19:20 +1100
  - David P. Nesbitt - Wed, 19 Jan 2005 16:51:34 -0800 (PST)
- Kevin Rodgers - Thu, 20 Jan 2005 11:09:15 -0700
  - Jim Neff - Thu, 20 Jan 2005 13:24:32 -0500 <=
- Pieter Reint Siegers Kort - Tue, 18 Jan 2005 18:30:12 -0600
- Pawson, David - Wed, 19 Jan 2005 09:59:41 -0000
  - Michael Kay - Wed, 19 Jan 2005 10:47:21 -0000
- Pieter Reint Siegers Kort - Thu, 20 Jan 2005 12:20:18 -0600

<- Previous	Index	Next ->
Re: [xsl] optimization for very lar, Kevin Rodgers	Thread	RE: [xsl] optimization for very lar, Pieter Reint Siegers
RE: [xsl] optimization for very lar, Pieter Reint Siegers	Date	[xsl] Test a node set based on an a, Cynthia DeLaria
	Month

<-prev [Thread] next->	<-prev [Date] next->
Month Index \| List Home