Subject: RE: [xsl] optimization for very large, flat documents From: "Jim Neff" <jneff@xxxxxxxxxxxxxxx> Date: Thu, 20 Jan 2005 13:24:32 -0500 |
Kevin, I don't know if this would be of help to you, but I was having severe timing issues too and I was able to cut my processing time dramatically. My original test file was only 400 KB and took about 50 seconds to process. I went on to my next group which was a 12 MB file and took 8 minutes to process!!! Memory usage varied from 100-300 megs. I was doing a lot of these kinds of for-each select statements: select="../Table[(CLM_CG_CUST_CODE = current()/CLM_CG_CUST_CODE) and (CLM_NO = current()/CLM_NO) and (not(CTD_SYS_ID = preceding::Table/CTD_SYS_ID))]" Basically asking it to compare the current node to every other node in the source tree for each level (this example is only level 2 of 4) This has a net result of increasing time exponentially (more time is needed per record as you add more records). I learned how to use the for-each-group statement and now my grouping statements look like this: <xsl:for-each-group select="current-group()" group-by="CTD_SYS_ID"> And now I get the same results but processing takes 6 seconds!!! Just another XSLT success story ;) --Jim Neff -----Original Message----- From: Kevin Rodgers [mailto:kevin.rodgers@xxxxxxx] Sent: Thursday, January 20, 2005 1:09 PM To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Subject: Re: [xsl] optimization for very large, flat documents Thanks to everyone who responded. For now I plan to follow Pieter's idea of chunking the data into manageable pieces (16-64 MB). Then I'm going to look into Michael's suggestions about STX (unfortunately, not yet a W3C recommendation and thus not widely implemented) and XQuery. For anyone interested in some numbers, I've split each of my 2 large files (613 MB and 656 MB) into subfiles of 16 K independent entries (which vary in size), yielding sets of 25 and 37 subfiles (of approx. 25 MB and 17 MB each, respectively). I process them by running Saxon 8.2 from the command line (with an -Xmx value of 8 times the file size) on a Sun UltraSPARC with 2 GB of real memory. The set of 37 17 MB XML subfiles are processed with a slightly simpler stylesheet, and take about 1:15 (minutes:seconds) each; the set of 25 25 MB XML subfiles use 1 document() call per entry to/from a servlet on a different host and take about 8 minutes each. My next step is to use Saxon's profiling features to find out where I can improve my stylesheet's performance. Thanks again to everyone on xsl-list for all your help! -- Kevin Rodgers
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] optimization for very lar, Kevin Rodgers | Thread | RE: [xsl] optimization for very lar, Pieter Reint Siegers |
RE: [xsl] optimization for very lar, Pieter Reint Siegers | Date | [xsl] Test a node set based on an a, Cynthia DeLaria |
Month |