Subject: RE: [xsl] optimization for very large, flat documents From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Wed, 19 Jan 2005 09:16:56 -0000 |
> I'm trying to process a very large (600 MB) flat XML document, a > bibliography where each of the 400,000 entries is completely > independent > of the others. According to the Saxon web site and mailing > list, it'll > take approx. 5-10 times that (3 GB) to hold the document tree > in memory, > which is impractical. The Saxon mailing list also has some tips about > how to accomplish this, but my question is: Why doesn't XSLT provide a > way to specify that a matched node can be processed > independently of its > predecessor and successor siblings? Alternatively, couldn't an XSLT > processor infer that from the complete absence of XPath > expressions that > refer to predecessor and successor siblings? I think the reason that XSLT vendors have not tried this approach is: (a) there are rather few stylesheets where the technique works, and can be seen statically to work. It's not enough that all path expressions should select downwards: there must be no absolute path expressions, no global variables that select from the initial context node, no keys, and probably quite a few other conditions besides. (b) for such stylesheets, a completely different run-time approach is needed: effectively, a different XSLT processor. I think that in practice if you want to do serial transformation then a functional language is not the right answer: if you can only look at each piece of input data once, then you need the ability to remember what you have seen, so you need a procedural language with updatable memory. That's why STX was invented. However, I think there is scope for someone to package up the idea of running an XSLT transform on each "record" in a large file, and then recombining the results. Michael Kay http://www.saxonica.com/
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] optimization for very large, , Kevin Rodgers | Thread | Re: [xsl] optimization for very lar, Dimitre Novatchev |
Re: [xsl] Usage of sum with RTF, Ranjan K. Baisak | Date | [xsl] XPath performance comparisons, Michael Kay |
Month |