RE: sgml-parse and GC

Subject: RE: sgml-parse and GC
From: "Didier PH Martin" <martind@xxxxxxxxxxxxx>
Date: Tue, 20 Jul 1999 19:48:45 -0400
Hi Peter,

Peter said:
sgml-parse works by asking the GroveManager to load the entity with the
system id sysid then returning a NodePtrNodeListObj pointing to the root
of the new grove. This NodeListPtrObj is GC'd like any ELObj, so there is
no problem with garbage collection I think.

The reason the grove stays in memory is that the GroveManager (actually
DssslApp) caches the groves resulting from parsing documents. If the
groves should be removed, then you ought to be quiet suer it won't be used
again during the same run of jade. How would one know this in advance?

Didier says:
I saw what you noticed in the code, but I didn't knew why it is not garbage
collected. So, your observation (as a good OpenJade archeologist) lead you
to conclude that DssslApp prevent garbage collection and force the grove to
stay in memory and thus implement a grove cache. Thank for the info, I'll
now investigate in dssslApp.

About grove caching, I am not so sure that keeping a grove is a good thing.
For example, the main grove (i.e. the grove created for the processed
document) could be released as soon as the (process-children) procedure is
finished on the Root . Same thing for a grove returned from a sgml-parse
where the process is finished when the (process-node-list) on the grove's
root is finished. In both cases, the grove could be released because in both
ways, the FOT is completed because the processing on the root element is
completed ( and therefore for all its children). Then a default condition
for the processor could be to have no cache, and all groves garbage
collected when the processing is finished for these groves. If necessary a
switch could indicate that caching is required and then all groves kept in
memory. Would could then be able to process collection of large documents.

Peter said:
BTW, the nodes in the grove have to stay accessible until the FOT is
built. This I think is true for all nodes resulting in something in the
FOT. See FOTBuilder::startNode()/endNode().

So my conclusion is that you'll need a lot of virtual memory (or other
storage for the groves) to process large documents. I don't see how to
make this different. (Ofcourse you may have the groves in a database.)

Didier says:
The grove has to be present as long as the processing is not completed for
the root node and therefore not until all its children are processed. Thus,
at least two groves could be present at a time:
a) the source document grove
b) the sgml-parse resultant grove.

Off course, some scripts may lead to a situation where more than two groves
are simultaneously present and then would require a lot of virtual memory
(and then cause swapping). Speaking of swapping, it depend a lot of how we
create object in the heap. Is object are created as near as possible then
swapping is kept at a minimum, if objects are spread to much the swapping is
increased. Thus, object allocation could improve the performance of objects
access so that paging of virtual memory is minimized. Some product like
smartheap does that.

regards
Didier PH Martin
mailto:martind@xxxxxxxxxxxxx
http://www.netfolder.com

regards
Didier PH Martin
mailto:martind@xxxxxxxxxxxxx
http://www.netfolder.com


 DSSSList info and archive:  http://www.mulberrytech.com/dsssl/dssslist


Current Thread