RE: sgml-parse and GC

Subject: RE: sgml-parse and GC
From: "Didier PH Martin" <martind@xxxxxxxxxxxxx>
Date: Sat, 24 Jul 1999 10:21:35 -0400
Hi Sean,

Sean said:
Please consider the following..
1. Each <collection><document> may require a separate catalog.
2. There are performance issues associated with large collections.
<batch temporal='serial' model='SingleProcess'> vs <batch
temporal='parallel' model='MultiProcess'>
For example, should <document href="c:/mydir/mydoc1.sgm"> be able to
reference elements of <document
href="http://www.netfolder.com/mydoc2.xml";>
then multi-thread/process implementations of sgml-parse will require
shared memory mechanisms as well?
The 'thread..to process this new grove..' mentioned above
hits this already hideous constraint with an asynchronous ugly stick!

Didier says:
Good point, I do not know if actually, the sgml-parse will consider the
file's catalog. I know that it is taking care of the DOCTYPE but I am not so
sure it takes care of its catalog. Normally, it should.

For the second point I agree with you. This is why it will be probably
easier if groves are processed one at a time. So, for example the processing
loop would be like:

a) processing the main grove. The processing loop encounters a <document>
element. This latter contains a sgml-parse and a (process-node-list)
commands . at this time -
b)this new grove is processed entirely within the processing scope of the
(element document ...) rule. Then -
c)when processing is completed, the main document processing loop is
resumed.

With this algorithm we may have less problems. Now, our problem is to find
ways to reduce memory paging. The more I am studying the problem, the more I
see that in fact is not so much having the groves garbage collected but to
have elements clustered so that paging is minimized. When that is the case,
performance is increased. Off course, this could takes a lot of disk space
but at least the processing will be more to process the document than swap
pages. So, actually, I am studying the heap pattern of the parsing->grove
module (i.e. the grove manager) and see how this could be improved.

regards
Didier PH Martin
mailto:martind@xxxxxxxxxxxxx
http://www.netfolder.com


 DSSSList info and archive:  http://www.mulberrytech.com/dsssl/dssslist


Current Thread