Subject: Summary of DocBook HTML speedup improvements From: Norman Walsh <norm@xxxxxxxxxxxxx> Date: Wed, 1 Apr 1998 07:13:34 -0500 |
Hello world, James suggested that I post an explanation of the changes I made to the DocBook HTML stylesheet to improve its performance, so here's a quick summary. One of the major differences between the print and HTML stylesheets is that the HTML stylesheet chunks the content up into pieces. The stylesheet then has to construct links between all the chunks. The slowest part of this process is calculating the filename that will be used for any given chunk. The algorithm used to be that the name of a chunk was a mnemonic for the kind of chunk it was ("c" for chapter, "a" for appendix", etc.) followed by the (element-number) of the chunk. In the case of SECT1 chunks, the (element-number) of the sect1 was appended to the filename calculated for its parent. So, for example, the base filename the second preface was "f02". The base filename for the third section of second chapter, was "c0203". The fourth section in the first appendix was "a0104", etc. It turns out that calculating (element-number) is, relatively speaking, very slow. This problem is exacerbated by the fact that filenames have to be calculated not only for navigation, but also for every xref or link. The solution was to use the (all-element-number) function instead of (element-number). All-element-number is an extension supported by Jade; it efficiently returns the number of the node within all of the elements in the grove. From the point of view of filenames, this has the disadvantage that there's no longer any way to make the filenames meaningful. But that seems like a small price to pay for a performance improvement of at least a factor of five. The new algorithm used to calculate filenames is to append the (all-element-number) of the element to a mnemonic for the type of chunk that it is. Another wrinkle in filename calculation is that the stylesheet supports PIs to specify the desired filename. This way you can change the filename of the root element from "book01.htm" to "index.html". In order to find the PIs, I was effectively doing a loop over every child node of the component-level elements. James pointed out that if this loop ran over mixed content, it would be very inefficient. (While DSSSL specifies that every character is a node in the grove, Jade treats the character data in a much more efficient manner _unless_ a DSSSL expression requires it to access individual characters.) In the case of DocBook, I don't think that my loop ever actually ran over mixed content, but the solution is worth noting: To find PIs, rather than using a loop over the children of a node, loop over (select-by-class (children node) 'pi). The select-by-class function efficiently filters out the PIs. Those two changes, particularly the use of all-element-number, have made the DocBook HTML stylesheet much more useful for large documents. Thanks, James! --norm DSSSList info and archive: http://www.mulberrytech.com/dsssl/dssslist
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Thread | Re: Production notes (Tables and Do, Chris Maden | |
Date | Re: Production notes (Tables and Do, Chris Maden | |
Month |