RE: sgml-parse and GC

Subject: RE: sgml-parse and GC
From: Peter Nilsson <pnidv96@xxxxxxxxxxxxxx>
Date: Thu, 22 Jul 1999 21:27:40 +0200 (CEST)
On Thu, 22 Jul 1999, Avi Kivity wrote:

> On Thursday, July 22, 1999 18:07, Peter Nilsson
> [SMTP:pnidv96@xxxxxxxxxxxxxx] wrote:
> > 
> > Hopefully, the reference counting mechanism will keep the nodes in memory
> > as long as necessary, but I'm not sure.
> 
> It will, and thus defeat your intent.
> 
The reason I wasn't sure was that there certainly is cycles in the grove
and I haven't check how this problem is solved. (Reference counting
doesn't work with circular links, but this is a wellknown fact.) However
this is probably solved; I can check out how it's done myself:-)

> There are many ways for a node pointer to find itself hooked somewhere. For
> example, it is an argument to FOTBuilder::charactersFromNode() (or
> something). So if you call (process-children) on a node with characters,
> then that node, and its entire grove (since a grove is reachable from any of
> its nodes), will be kept in memory, whether or not it is cached.
> 
There are also startNode/endNode calls for every node (or chunk).

> If that can be worked around, then I would suggest removing groves from the
> cache when memory pressure increases. This isn't dangerous because if a
> grove is being processed then some node or other is in memory.
> 
If the backend saves the node pointers removing the cache. But I believe
the current RTF and TeX backends don't. Instead (at least the TeX backend)
uses allElementNumber to get unique ids for the nodes.

> Another matter which worries me is fully integrated formatters. Since they
> will have to resolve cross references, they will probably have to keep the
> entire FOT in memory, or do multiple passes until convergence (a two-pass
> solution may not suffice).
> 
If the node pointers are only used as ids, you could do like the TeX
backend. But if you want to support the general-indirect-sosofo (and I
want to in my formatter), it has to be in memory until the flow objexts
are on paper.

So, to get to an end, I think we could just remove the caching (by a
switch or an external procedure, whatever) and sort of "see what happens."
In some special situations you would gain from it, but not in general.

Regards,
/Peter Nilsson

--
'(?P . (?e . (?t . (?e . (?r)))))


 DSSSList info and archive:  http://www.mulberrytech.com/dsssl/dssslist


Current Thread