Subject: RE: [xsl] optimization for very large, flat documents From: Pieter Reint Siegers Kort <pieter.siegers@xxxxxxxxxxx> Date: Tue, 18 Jan 2005 18:30:12 -0600 |
Hi Kevin, It has to do with the way the input source is built in memory, i.e a tree-like structure with relationships; likewise, the result that the XSLT produces is actually another tree which is serialized by the application that saves it to disk. Serialization is not part of the XSL Transformation. On disk, it is stored in a very different way. In the end, to work with a big input source, or a big stylesheet, both need to be parsed and rebuilt *completely* into memory to obtain the same tree-like structure that it was before it was saved. You could use a SAX like approach, but I'm not sure as how to do that - maybe others can jump in here. But, if as you say your entries are independent from each other, then another approach (one I would risk to take) is to read your big xml source in chunks, and process them independently as if they were separate xml input sources, and saving the results to a common file. Of course reading in chunks must not be done by the XML parser and tree builder - it could be very well an application that opens the file, reads in a chunk, passes it to the XSL processor, etc. etc. A final observation/question is why that big xml of 600MB was created in the first place? I would have chosen to fill a database, and query from there, applying the XSLTs, and save the result in the format you'd need. But then again, you or the one that processed the file must have had its reasons to create it as one single file... I would wish a filesystem that can store an input source (whether xml or xsl) as a direct representation of the tree-like structure, that way we would be freed of using lots of memory (and gain performance?). Cheers, <prs/> -----Original Message----- From: Kevin Rodgers [mailto:kevin.rodgers@xxxxxxx] Sent: Martes, 18 de Enero de 2005 12:05 p.m. To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Subject: [xsl] optimization for very large, flat documents I'm trying to process a very large (600 MB) flat XML document, a bibliography where each of the 400,000 entries is completely independent of the others. According to the Saxon web site and mailing list, it'll take approx. 5-10 times that (3 GB) to hold the document tree in memory, which is impractical. The Saxon mailing list also has some tips about how to accomplish this, but my question is: Why doesn't XSLT provide a way to specify that a matched node can be processed independently of its predecessor and successor siblings? Alternatively, couldn't an XSLT processor infer that from the complete absence of XPath expressions that refer to predecessor and successor siblings? -- Kevin Rodgers
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] optimization for very lar, Jim Neff | Thread | RE: [xsl] optimization for very lar, Pawson, David |
RE: [xsl] how to call extension fun, Michael Kay | Date | Re: [xsl] Usage of sum with RTF, Ranjan K. Baisak |
Month |