RE: About Constructions rules

Subject: RE: About Constructions rules
From: Avi Kivity <Avi@xxxxxxxxxxxxx>
Date: Thu, 15 Jul 1999 23:25:58 +0300
On Thursday, July 15, 1999 22:34, Brandon Ibach [SMTP:bibach@xxxxxxxxxxxxxx]
wrote:
>    Valid point, but unless I'm being naive here (which is a reasonable
> possibility :), I don't think it's insurmountable.  The idea of being
> able to start processing before the entire document is parsed is very
> nice, but does Jade really do this?

I believe 1.2.1 does, at least on Windows (using something based on 1.1.1
myself). There is a parser thread and a processing thread. Grep for Thread
and you'll see them too.

>    It would have to happen one of two ways.  Jade would have to be
> multi-threaded, with separate threads for the parsing (and grove
> building) and processing.  I'm fairly certain this is not the case.
> The alternative would be that Jade parses the document on an as-needed
> basis.  It's possible that this could be happening, but is it?
> Matthias, Didier, Avi, etc?  Can you answer this?
>    Whichever (if either) the case is, I think there's a way to fit
> query construction rules in without needing to have the entire
> document parsed before you evaluate it.  Just as the parsing is done
> piece by piece, could not the query evaluation be done as well?  In
> other words, as much as possible, build up the result node list as you
> build up the source grove?  Keep in mind, the node list as a whole is
> irrelevant to the query rule.  All we care at any given point is
> whether a certain node is in the list, and if we're to the point of
> processing that node, chances are (though not certain) that enough of
> the document has been parsed to have been able to determine whether
> that node is in the result list of the query.

There are usually more than one node eligible for processing (e.g. during
evaluation of (process-children)). So if one of them waits due to a query
requiring complete traversal of the grove, so what? There's plenty of work
to feed a theoretical multithreaded dsssl processor.

Node lists are lazy, so the query can return *immediately* after its called.
It must only be (partially) computed when the first node is processed. This
is not related directly to queries -- normal processing can also do this:

(element x
    (let (
            (r  (with-mode rain     (process-children)))
            (s  (with-mode sunshine (process-children)))
         )
         (if (raining?) r s)
    ) 
)

On first sight, it looks as if both r and s are evaluated fully, but a
clever processor need not do that. It can (and perhaps does?) just keep a
pointer to the process-children expression with its context. Only when the
result of processing node x is incorporated into the flow object tree does
one of {r, s} have to eb evaluated. And if it is discarded, so be it.

When (process-children) is evaluated, the same applies: the processor can
evaluate whatever nodes are available immediately, and wait for the parser
to generate other nodes.

What we have here is some tree-structured variant of the unix pipeline: the
processing threads wait for the parser, and the formatting thread (which
builds the FOT) waits for the processing threads. The processing threads are
not started until their output is actually needed.

           /--processor 1--\
parse------+--processor 2--+--formatter-->
           \--processor 3--/


(This is all theoretical, of course, but I don't think it would be *too*
difficult to implement.)

>    The disclaimer here is that this is highly dependent upon the
> nature of the query and how it is written.  Having just done a crash
> course in relational query optimization (a la Oracle), I know that
> there are many, many ways to break a query such that it can't return
> results as it finds them.  For instance, if the final results of the
> query are to be sorted, then you won't get any results back from it
> until the query is complete.  The engine needs to see *all* of the
> results in order to return any, because of the sort.
>    Again, this is all quite theoretical.  Does Jade actually begin
> processing anything before the source document has been completely
> parsed and "groved"?  If not, then it doesn't really matter.
> 
It does, but I can't see any big win in it. Having two threads performing
asymetric tasks yields a 100% improvement in speed if parsing and processing
are evenly matched, but much less if they are not. And it doesn't scale with
more than two processors.

Now, parsing is inherently serial. Processing can conceptually be
parallelized due to the side-effect-freeness, but of course depends on the
parser. High-quality formatting (read: page-sequence) is probably serial in
nature too, mostly.

Multithreading can benefit if (a) processing is so complex it outweighs
parsing time or (b) you have the grove in a random access database.

In any case, waiting for the parser to return the last node can affect
wall-clock time, but not the computational complexity which is more
important (right?).

---
"The only words which have meaning are the last ones spoken"



 DSSSList info and archive:  http://www.mulberrytech.com/dsssl/dssslist


Current Thread