Re: [xsl] help with random number generation

Subject: Re: [xsl] help with random number generation
From: "C. M. Sperberg-McQueen cmsmcq@xxxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 18 Nov 2022 18:48:07 -0000
Dimitre,

thank you for your note.  You write:

> Or maybe I don't understand the requirements well?

I think you probably understand the requirements at least as well as I
do -- although as I have struggled with this problem my understanding
has certainly changed.

Ultimately, my goal is to have a simulation that allows me to estimate
reliably the probability of certain classes of result; I think that
means I need good random sequences of numbers to guide the simulations.
The list of requirements I managed to formulate in response to Martin
Honnen's useful remark the other day identifies several ways in which I
believe a sequence can fail to be good.  But who knows? I may find more
ways for sequences to be bad as I continue my work.

"Dimitre Novatchev dnovatchev@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> writes:

> Michael,
>
> Reading the requirements you listed, it seems to me that you need:
>
>       Num-TimeSlices * NumNodesPerTimeSlice  
>
> different random numbers.

I think that is correct.  One practical snag is that I do not know in
advance how many nodes there will be for a given time slice, until I get
to that time slice.  The maximum number for any timeslice would be two
to n, where n is 1 for the first timeslce, 2 for the second, and so on.

> Why not generate a sequence of random numbers with this length?
>
> This will require a single seed, that you could use based on the
> current time (the start of the simulation).
>
> Thus, for every simulation there will be a different seed.

I expect to take this advice.  It wasn't my first approach, because
generating the sequence in advance and then indexing into it seemed more
complicated than calling a random-number generator at the point where
the number was needed. 

My current conjecture is just that if I need n random numbers,
generating that sequence from one seed is likely to produce a good
sequence, and generating n/2 sequences from the keys I can generate, and
taking the first and second (or the third and fourth) item of each
sequence, is likely to produce a bad sequence.

At the moment, I have plans for several further steps.

First, try to gain better understanding of the behavior of my current
code (and diagnosis of its problems, if in fact I am correct in
believing that the current behavior has problems):

  - make the simulation dump the sequence of numbers actually used

  - apply some basic statistical tests to the sequences being generated;
    if I understand things correctly it is not possible to prove that a
    given sequence is random, but it is possible to show that a
    sequence appears non-random, and that a set of sequences is unlikely
    to result from a random process

  - calculate a few more expected probabilities so that I can check the
    plausibility of a given set of simulation runs.

    Your method of using Monte Carlo results to estimate a known value
    is very useful here.  I have estimated, and then re-estimated, the
    probability of my a particular result given particular parameters,
    and my main evidence that my simulation is not currently behaving
    well is that it is not converging on the expected number.

    It would be nice to have a few more checkable numbers, which
    requires that I either think hard about the probabilities of
    particular sequences of events or give up and get an introductory
    textbook on probability and statistics out of the library.

    (If one were writing a simulation of coin tossing, one would expect
    trials of 100 tosses to end up with particular numbers of heads with
    a certain frequency - how closely does the simulation match the
    expected results?  If half of a set of trials were 100 heads in a
    row, and the other half were 100 tails in a row, I would be inclined
    to think that a bug in the simulation might be more likely than that
    a random process had produced that result.  (Particularly if it was
    me who wrote the simulation.)

Second, implement different behavior, using the idea that both you and
Mike Kay have described: use a single long-enough sequence of random
numbers for each run of the simulation.

  - That sequence may be generated and saved externally using a
    pseudo-random-number generator (e.g. the one in FXSL, or
    fn:random-number-generator(), or any of the random number generators
    available for me to use.

  - Or it could be acquired from any of the several sources of random
    bits available on the net: random.org generates random numbers from
    observations of atmospheric noise; NIST publishes a block of 512
    random bits every 60 seconds (a service apparently called the NIST
    Randomness Beacon), although I did not see what their source of bits
    is; the web says there used to be one service that used lava lamps
    as an easily observable chaotic system.

    For use in simulations like mine, I think that pseudo-random numbers
    should serve the purpose as well as true random numbers acquired
    from quantum systems or from chaotic systems like weather, but it
    will be interesting to try both and see if they produce similar
    results.

> In case you need the seeds to be not too-close, you could generate a
> separate sequence of random numbers each of which to be used as the
> seed for a separate simulation.

This is also a good idea.  I have learned that some people select their
keys from a list of numbers generated by a source of 'true' random
numbers like random.org.  This appeals to me, though it probably doesn't
matter for this task.

Thanks to all who have responded for your help; I will report back if
and when I have a simulation that passes the plausibility tests I can
formulate for it.

Michael    

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Current Thread