RE: sgml-parse and GC

Subject: RE: sgml-parse and GC
From: Avi Kivity <Avi@xxxxxxxxxxxxx>
Date: Thu, 22 Jul 1999 19:04:08 +0300
On Thursday, July 22, 1999 18:07, Peter Nilsson
[SMTP:pnidv96@xxxxxxxxxxxxxx] wrote:
> 
> Maybe sgml-parse-no-cache (as an external procedure) would be easier,
> since it is the grove that is cached (not node lists in general). It
> would:
> - Check if sysid is in cache and return the cached grove.
> - Otherwise load it but not put it in the cache.
> 
> Hopefully, the reference counting mechanism will keep the nodes in memory
> as long as necessary, but I'm not sure.

It will, and thus defeat your intent.

There are many ways for a node pointer to find itself hooked somewhere. For
example, it is an argument to FOTBuilder::charactersFromNode() (or
something). So if you call (process-children) on a node with characters,
then that node, and its entire grove (since a grove is reachable from any of
its nodes), will be kept in memory, whether or not it is cached.

If that can be worked around, then I would suggest removing groves from the
cache when memory pressure increases. This isn't dangerous because if a
grove is being processed then some node or other is in memory.

As to databases: that's a win only if you can increase locality of
reference. Paging in a node from virtual memory costs only one disk access.
Paging it from a database will cost more. The only way to gain is to group
together nodes that are accessed together.

Perhaps one way to gain a little is to allocate nodes and expression
language objects and flow objects from separate heaps. This way, the grove
(which tends to be reused most) occupies fewer pages. 

Another matter which worries me is fully integrated formatters. Since they
will have to resolve cross references, they will probably have to keep the
entire FOT in memory, or do multiple passes until convergence (a two-pass
solution may not suffice).

- Avi


 DSSSList info and archive:  http://www.mulberrytech.com/dsssl/dssslist


Current Thread