Subject: RE: sgml-parse and GC From: "Didier PH Martin" <martind@xxxxxxxxxxxxx> Date: Thu, 22 Jul 1999 06:07:55 -0400 |
Hi Peter, > Didier says: > About grove caching, I am not so sure that keeping a grove is a good thing. > For example, the main grove (i.e. the grove created for the processed > document) could be released as soon as the (process-children) procedure is > finished on the Root . Same thing for a grove returned from a sgml-parse Peter said: Then the whole FOT is constructed, and all DSSSL processing done so this won't buy you much. Didier says: Ad contrario, this gives you something. Again, take the example where you want to process a collection of documents. All documents are included in a SGML/XML document as below: <collection> <document href="c:/mydir/mydoc1.sgm"> <document href="http://www.netfolder.com/mydoc2.xml"> </collection> Then, a DSSSL script process this source document and contain a rule to process the document element as: (element document (sgml-parse (attribute-string "href" (current-node)) (process-node-list) ) then, a thread can be set to process this new grove (as a autonomous entity) until the current-root and all its children is processed. For each document element we would have a separate grove and this grove processed in a separate thread. This way, a batch job to processed a collection of documents could be expressed as a SGML document itself instead of a platform dependent batch file. So, for example, If I create a "bat" file on windows this won't work on Linux. But if the batch processing file is expressed in SGML or XML it is portable to any platform running OpenJade. If the target machine do not have a lot of memory, then instead of starting the processing of a grove in a thread, then it could be processed in the main thread and de facto, processed one at a time. This implies that only two grove are present in memory at the same time: a) the source grove b) the independent grove If the source grove is relatively small (contains the document collection to be processed), then most of the memory resource is left for the independent grove. So, yes we gained something: platform independent DSSSL batch processing expression language. Better than that, the expression language for the batch processing is a SGML application! > where the process is finished when the (process-node-list) on the grove's > root is finished. In both cases, the grove could be released because in both > ways, the FOT is completed because the processing on the root element is > completed ( and therefore for all its children). Then a default condition Peter said: Since the values for the characteristics for the resulting sosofo is not evaluated immediately, this is not quiet correct. The FOT has to be built so that all characteristics expressions can be evaluated before removing the grove from memory (because current-node might be used in a characteristic specification). Since flow objects might "bubble up" in the resulting FOT (if they are labeled and the label doesn't correspond to any port in the content-map for the constructed sosofo), a characteristic might be evaluated above the constructed flow object in the end. I think the refcounters will solve these problems, but then if you don't cache the nodes you might get two (or more) versions of the same grove in memory at the same time if you call sgml-parse in more than one place. Another problem if you remove the groves. What should (node-list=? (sgml-parse foo) (sgml-parse foo)) return? If it has to return true, isn't it hard to implement if you don't keep the groves? Didier says: Good points. Then we probably need to introduce an explicit construct that state that the grove is released as soon as the whole process-children is done on the whole grove. The script writer knows the processing context. So, maybe a construct like (process-and-release-node-list) or something similar would do the job. I agree, that current implementation has limitation on this side and that platform independent batch processing expressed as a SGML application cannot be realistically done in the current implementation. Thus, not to fall into the situation you stated, a new construct like (process-and-release-node-list) would resolve the problem you stated. thus, now the DSSSL expresssion state earlier would be: (element document (sgml-parse (attribute-string "href" (current-node)) (process-and-release-node-list) ) In this case, as soon as the node-list is processed, then it is released. This is, naturally one of the ways to implement it. By discussing it, we may find a better construct (this one, has at least, the advantage to be explicit and self explanatory). I know, the original language conception is based on infinite resources, infinite time, infinite... But, we, as mortals, have to live in limited worlds, sometime our expression tools should reflect this world full of limitation. Usually, when that is the case, something useful emerge. > Peter said: > BTW, the nodes in the grove have to stay accessible until the FOT is > built. This I think is true for all nodes resulting in something in the > FOT. See FOTBuilder::startNode()/endNode(). > > So my conclusion is that you'll need a lot of virtual memory (or other > storage for the groves) to process large documents. I don't see how to > make this different. (Ofcourse you may have the groves in a database.) > > Didier says: > The grove has to be present as long as the processing is not completed for > the root node and therefore not until all its children are processed. Thus, > at least two groves could be present at a time: > a) the source document grove > b) the sgml-parse resultant grove. > > Off course, some scripts may lead to a situation where more than two groves > are simultaneously present and then would require a lot of virtual memory > (and then cause swapping). Speaking of swapping, it depend a lot of how we Peter said: If you want to process large documents and wanna be able to navigate arbitrarily ghrough them (which DSSSL requires), then you will need a lot of memory. How else would it be? Maybe the grove implementation could be optimized better for memory, but I don't think so since James Clark probably spent a lot of effort in this critical area. Didier says: Don't forget Peter that we define now the future of DSSSL. With this in mind, we can add new useful constructs and then bring that ISO as a draft. But this time, with practical implementation and experience behind us. Then, a new construct could be included and set with the -2 flag (for DSSSL-2) and used as experimental future DSSSL standard construct. Yes James put a lot of effort but so do we (the OpenJade team), and, in our case, we want a future for OpenJade, and better than that, a new DSSSL-2 international standard that includes what we learned from the praxis. regards Didier PH Martin mailto:martind@xxxxxxxxxxxxx http://www.netfolder.com DSSSList info and archive: http://www.mulberrytech.com/dsssl/dssslist
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: sgml-parse and GC, Peter Nilsson | Thread | RE: sgml-parse and GC, Peter Nilsson |
RE: sgml-parse and GC, Wilson, James.W | Date | Re: attributes in SGML to SGML, Jany Quintard |
Month |