Re: [xsl] help with random number generation

Subject: Re: [xsl] help with random number generation
From: "Dimitre Novatchev dnovatchev@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 16 Nov 2022 20:45:30 -0000
I would implement/use (even from the 20-years old FXSL) a random-sequence()
function that produces a sequence of $N random numbers.

Thus, if you know the total number of nodes, for which random numbers are
needed, then you could use this as the value for $N.

As for using time for a seed, new ()different and actual) current times can
be produced even in pure XPath 3.1.  But I would prefer to generate a
sequence of randoms, not to start from a different seed every time.

On Wed, Nov 16, 2022 at 11:30 AM C. M. Sperberg-McQueen
cmsmcq@xxxxxxxxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> Some readers of this list may know enough about pseudo-random number
> generators and their use to advise me.  I hope so!
>
> I am writing an XSLT program to simulate a process, with the aim of
> using it to make Monte Carlo estimates of the probability that the
> process will produce different kinds of results.  The state of the
> simulation is represented by an XML document whose size and shape vary
> over time; at each timeslice, we apply templates to the tree produced by
> the preceding timeslice, and generate a new tree.  The simulation is
> intended to implement a simple birth-and-death model of a population:
> for each 'live' node in the tree, we choose randomly whether it gets a
> new child in this time slice or not, and also choose randomly whether
> the node dies in this time slice or not. We start with a single live
> individual and end up with a family tree.  (My interest in the
> simulation, if it matters, is in observing the effects of changes to the
> assumed birth and death rates on the resulting population of trees, and
> the probability that family trees of a given shape will develop.)
>
> In a Monte Carlo simulation, the quality of the random numbers used is
> likely to matter a good deal.  In particular, if there is too tight a
> correlation among the random numbers, the results are going to be biased
> in ways that are going to be very hard to understand and explain (and in
> some cases, hard to detect).
>
> XPath 3.0 has a random-number-generator() function which I would like to
> use if I can, but the setup I have chosen seems to impose some barriers.
>
>  - I can't just call fn:random-number-generator() each time I need a
>    random number, because the function is specified as deterministic,
>    and while the implementation-dependent seed may vary from run to run
>    of my stylesheet, it should not vary within the run.
>
>    If you want more than one random number, the idea is to use something
>    like
>
>      <xsl;variable name="rmap" as="map(*)"
>                    select="random-number-generator()"/>
>
>    for the first call, get your number using $map('number'), and make
>    the second call with something like
>
>      <xsl;variable name="rmap2" as="map(*)"
>                    select="$map('next')()"/>
>
>    and so on.  The seed to be used by the next call is embedded in the
>    function returns as the 'next' member of the map.
>
>  - I can do this as I descend the tree, so that the parent element
>    generates the random numbers it needs and then passes the appropriate
>    function to its child.
>
>    In that case, I'll get (what I hope is) a nice sequence of random
>    numbers as I descend the tree.  But each child is going to get the
>    same random-number generation function, which will mean that each
>    child is going to get the same random numbers and the simulation will
>    show siblings always behaving the same way.  Not a good idea.  I need
>    to thread a single sequence of random numbers (and generators)
>    through the tree, so that each node uses a different seed and will
>    get different random numbers.
>
>  - So maybe I could use an accumulator?
>
>    That will allow each node to get and use a different hidden seed, but
>    as far as I can tell, each time slice is going to start its traversal
>    of the tree with the same initial function, which means the root of
>    the tree is always going to get the same random numbers, and its
>    behavior will be the same in every timeslice.
>
>  - So maybe I can produce a different seed for each time I need to get a
>    random number, based on which time slice we are in, and which node we
>    are processing, and -- since I don't want every run of the simulator
>    to produce the same results -- a seed passed in as a parameter (so
>    that every run can be passed a different value, and they will produce
>    different results).
>
>    Then for each node that needs one or more random numbers, I calculate
>    a seed (ideally different for each pass over each node) and call
>    fn:random-number-generator() with that seed.
>
>    The results are ... disappointing, and I am hoping someone here can
>    help me do better.
>
> If I understand the state of play on conventional random number
> generators, for a good one, over time each value between 0 and 1 should
> have an approximately equal likelihood of turning up at any given time.
>
> For a given initial seed, however, the result is always going to be the
> same (which is good, for reproducing errors and for debugging).
>
> And I notice that if I call the random number generator a hundred times,
> using the integers 0 to 99 as seeds, the results I get don't vary much:
> the first few digits of the number are often the same.
>
> I have experimented with various ways of combining the number of the
> timeslice, the position of the node in the tree, and the seed passed to
> the stylesheet as a parameter, and ... the results seem to indicate that
> there is not enough variation in the random numbers I am getting.  In
> particular, when I calculate the probability of certain results arising,
> and compare that to the number of times the simulation produces those
> results, they are too far apart for my comfort.  (Choosing a shape for
> which the arithmetic is relatively easy, I calculate a 32% chance of the
> simulation generating that shape, but consistently get 80% or more
> results of that shape.  It's one thing for that to happen now and then,
> but it has now happened too often.  My prior belief in the correctness
> of my approach is not strong enough to withstand the evidence that the
> coin I am flipping is biased in ways I don't know how to control.)
>
> I have tried to pose this question in a general form, to give it broader
> interest and applicability; if I have inadvertently elided important
> details, feel free to ask -- the details are not secret, just (I hope)
> not relevant to the general question.
>
> If anyone can think of a relatively simple way to improve the random
> numbers I am generating, I would be glad of any hints.
>
> --
> C. M. Sperberg-McQueen
> Black Mesa Technologies LLC
> http://blackmesatech.com
> 
>
>

-- 
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
To achieve the impossible dream, try going to sleep.
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they write
all patents, too? :)
-------------------------------------
Sanity is madness put to good use.
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.

Current Thread