Subject: RE: [xsl] Processing Memory-Hungry Data Sets with XSLT 2 From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Wed, 12 Mar 2008 00:05:04 -0000 |
Almost any performance question is processor-specific to some extent. However, it's not unlikely that different processors use similar implementation techniques much of the time. Given your description of the problem, I would be looking for unnecessary temporary trees and copy operations. With Saxon it's usually the case that tree-construction (xsl:variable with content and no "as" attribute) is done eagerly, whereas sequence construction (xsl:variable with a select attribute) is done lazily. But with performance the devil is always in the detail, and sometimes it can be in quite surprising places in the detail. Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: Eliot Kimber [mailto:ekimber@xxxxxxxxxxxx] > Sent: 11 March 2008 19:51 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: [xsl] Processing Memory-Hungry Data Sets with XSLT 2 > > I'm implementing some DITA processing that is applied against > a large tree of maps and topics referenced from the maps in > order to generate HTML from the maps and the topics. There > are 10s of 1000s of maps and topics. > > I have two processors: one is essentially an identity > transform that process the map tree and copies it to the > output with a little bit of modification. The other is the > XML-to-HTML transform. It is still essentially a one-to-one > file-to-file transform but the result files are HTML instead > of copies. The process essentially does a top-down process of > the tree of maps, which consist of either links to submaps or > links to topics. Submaps are loaded and their topic links > processed. Links to topics result in loading the target > topics and processing them normally to generate HTML output. > This obviously results in a lot of source and target > documents in memory. The business logic is very simple, it's > just a lot of data. > > Using Saxon 9 the first script can process my entire corpus > but the second one (the HTML generator) fails about 1/2 way > through with an out of memory failure using the largest VM I > can request under OS X (2Gig). > > I tried using Saxon's extension discard-document() method but > that appeared to have no effect (I didn't really expect it to > since I don't think anything referenced ever gets unreferenced). > > My question is, are there any XSLT 2 techniques that might > help avoid this type of memory usage issue that are generic > (as opposed to Saxon specific)? I can think of several > multi-pass approaches involving the creation of intermediate > files that would work but time is short so I'm trying to keep > this as simple as I can and still have it work, so I was > hoping there might be some clever way to make an otherwise > naive top-down process more memory efficient. > > If the only answer is Saxon-specific then I'll move my > question to the Saxon list. > > Thanks, > > Eliot > -- > Eliot Kimber > Senior Solutions Architect > "Bringing Strategy, Content, and Technology Together" > Main: 610.631.6770 > www.reallysi.com > www.rsuitecms.com
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] Processing Memory-Hungry Data, Eliot Kimber | Thread | [xsl] adding hierarchy, Jim_Albright |
RE: [xsl] Easy question, big headac, Patrick Bergeron | Date | Re: [xsl] Processing on both a docu, Mark Peters |
Month |