RE: sgml-parse and GC

Subject: RE: sgml-parse and GC
From: Peter Nilsson <pnidv96@xxxxxxxxxxxxxx>
Date: Thu, 22 Jul 1999 10:19:02 +0200 (CEST)
On Tue, 20 Jul 1999, Didier PH Martin wrote:

> Didier says:
> About grove caching, I am not so sure that keeping a grove is a good thing.
> For example, the main grove (i.e. the grove created for the processed
> document) could be released as soon as the (process-children) procedure is
> finished on the Root . Same thing for a grove returned from a sgml-parse

Then the whole FOT is constructed, and all DSSSL processing done so this
won't buy you much.

> where the process is finished when the (process-node-list) on the grove's
> root is finished. In both cases, the grove could be released because in both
> ways, the FOT is completed because the processing on the root element is
> completed ( and therefore for all its children). Then a default condition

Since the values for the characteristics for the resulting sosofo is not
evaluated immediately, this is not quiet correct. The FOT has to be built
so that all characteristics expressions can be evaluated before removing
the grove from memory (because current-node might be used in a
characteristic specification). Since flow objects might "bubble up" in the
resulting FOT (if they are labeled and the label doesn't correspond to any
port in the content-map for the constructed sosofo), a characteristic
might be evaluated above the constructed flow object in the end. I think
the refcounters will solve these problems, but then if you don't cache the
nodes you might get two (or more) versions of the same grove in memory at
the same time if you call sgml-parse in more than one place.

Another problem if you remove the groves. What should
(node-list=? (sgml-parse foo) (sgml-parse foo))
return? If it has to return true, isn't it hard to implement if you don't
keep the groves?

> Peter said:
> BTW, the nodes in the grove have to stay accessible until the FOT is
> built. This I think is true for all nodes resulting in something in the
> FOT. See FOTBuilder::startNode()/endNode().
> 
> So my conclusion is that you'll need a lot of virtual memory (or other
> storage for the groves) to process large documents. I don't see how to
> make this different. (Ofcourse you may have the groves in a database.)
> 
> Didier says:
> The grove has to be present as long as the processing is not completed for
> the root node and therefore not until all its children are processed. Thus,
> at least two groves could be present at a time:
> a) the source document grove
> b) the sgml-parse resultant grove.
> 
> Off course, some scripts may lead to a situation where more than two groves
> are simultaneously present and then would require a lot of virtual memory
> (and then cause swapping). Speaking of swapping, it depend a lot of how we

If you want to process large documents and wanna be able to navigate
arbitrarily ghrough them (which DSSSL requires), then you will need a lot
of memory. How else would it be? Maybe the grove implementation could be
optimized better for memory, but I don't think so since James Clark
probably spent a lot of effort in this critical area.

regards,
/Peter Nilsson

--
'(?P . (?e . (?t . (?e . (?r)))))


 DSSSList info and archive:  http://www.mulberrytech.com/dsssl/dssslist


Current Thread