Subject: Re: [xsl] Applying Streaming To DITA Processing: Looking for Guidance From: "Jirka Kosek jirka@xxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Thu, 9 Oct 2014 15:47:11 -0000 |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 9.10.2014 16:16, Eliot Kimber ekimber@xxxxxxxxxxxx wrote: > Can streaming help, either with overall processing efficiency or > with memory usage? Yes, the typical motivation for streaming is saving memory consumption, in your case it's very unlikely that you can gain any performance benefits. > Where would I go today or in the near future to gain the > understanding of streaming required to answer these questions > (other than the XSLT 3 spec itself, obviously)? There were several talks and papers presented in past years both at XML Prague and Balisage conferences. For example: https://www.youtube.com/watch?v=OeSQ4ompB1g&index=6&list=PLQpqh98e9RgXPGvJaNsE3b1Sqncz6MGvr https://www.youtube.com/watch?v=kzGZvh-FbNw&list=PLQpqh98e9RgXPGvJaNsE3b1Sqncz6MGvr&index=7 If there is enough interested I can try to organize streaming workshop or something like that as a part of XML Prague 2015 (http://xmlprague.cz) > Because my data collection process is copying data to a new result, > I'm pretty sure it's inherently streamable: I'm just processing > documents in an order determined by a normal depth-first tree walk > of the map structure (a hierarchy of hyperlinks to topics) and > grabbing relevant data (e.g., division titles, figure titles, index > entries, etc.). If this was all I was doing, then for sure > streaming would help memory usage. > > But because I must then process each topic again to generate the > final result, and that process is not directly streamable, would > streaming the first phase help overall? You can split your transformation into two steps -- first will be streamable and second will not. Compared to the current situation you will save around 50% memory. > Taken a step further: are there implementation techniques I could > apply in order to make the second phase streamable (e.g., > collecting the information needed to render cross references > without having to fetch the target elements) and could I expect > that to then provide enough performance improvement to justify the > implementation cost? You can do this. You can process "compiled grand-source document" in a streaming mode and make lookups in smaller document with cross-referencing data in a non-streaming mode. > The current code is both mature and relatively naive in its > implementation. Reworking it to be streamable could entail a > significant refactoring (maybe, that's part of what I'm trying to > determine). > > The actual data processing cost is more or less fixed, so unless > streaming makes the XSLT operations faster, I wouldn't expect > streaming by itself to reduce processing time. It's very unlikely that streaming rewrite will make your code faster. Of course lookups in a small cross-ref auxiliary file will be faster than in a large document, but if you use keys today, it shouldn't be very big difference. > However, the primary concern in this use case is memory usage: > currently, memory required is proportional to the number of topics > in the publication, whereas it could be limited to simply the > largest topic plus the size of the collected data itself (which is > obviously much smaller than the size of the topics as it includes > the minimum data needed to enable numbering and such). I don't know how large is your documentation set, but I would be surprised if it couldn't fit into memory (who would read it then? :-). Streaming is generally useful when it's impossible to load documents into memory -- which on current machines means processing gigabytes large XML files. Jirka - -- - ------------------------------------------------------------------ Jirka Kosek e-mail: jirka@xxxxxxxx http://xmlguru.cz - ------------------------------------------------------------------ Professional XML consulting and training services DocBook customization, custom XSLT/XSL-FO document processing - ------------------------------------------------------------------ OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 rep. - ------------------------------------------------------------------ Bringing you XML Prague conference http://xmlprague.cz - ------------------------------------------------------------------ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (MingW32) iEYEARECAAYFAlQ2re4ACgkQzwmSw7n0dR6shwCffITFOIsRjAVeUE+XI4c6vHmt UEAAn1ssKI6bxGb59UYqi67McfirpoL1 =a1hq -----END PGP SIGNATURE-----
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] Applying Streaming To DITA Pr, Eliot Kimber ekimber | Thread | Re: [xsl] Applying Streaming To DIT, Eliot Kimber ekimber |
[xsl] Applying Streaming To DITA Pr, Eliot Kimber ekimber | Date | Re: [xsl] Applying Streaming To DIT, Eliot Kimber ekimber |
Month |