[xsl] help with random number generation

Subject: [xsl] help with random number generation
From: "C. M. Sperberg-McQueen cmsmcq@xxxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 16 Nov 2022 19:30:44 -0000
Some readers of this list may know enough about pseudo-random number
generators and their use to advise me.  I hope so!

I am writing an XSLT program to simulate a process, with the aim of
using it to make Monte Carlo estimates of the probability that the
process will produce different kinds of results.  The state of the
simulation is represented by an XML document whose size and shape vary
over time; at each timeslice, we apply templates to the tree produced by
the preceding timeslice, and generate a new tree.  The simulation is
intended to implement a simple birth-and-death model of a population:
for each 'live' node in the tree, we choose randomly whether it gets a
new child in this time slice or not, and also choose randomly whether
the node dies in this time slice or not. We start with a single live
individual and end up with a family tree.  (My interest in the
simulation, if it matters, is in observing the effects of changes to the
assumed birth and death rates on the resulting population of trees, and
the probability that family trees of a given shape will develop.)

In a Monte Carlo simulation, the quality of the random numbers used is
likely to matter a good deal.  In particular, if there is too tight a
correlation among the random numbers, the results are going to be biased
in ways that are going to be very hard to understand and explain (and in
some cases, hard to detect).

XPath 3.0 has a random-number-generator() function which I would like to
use if I can, but the setup I have chosen seems to impose some barriers.

 - I can't just call fn:random-number-generator() each time I need a
   random number, because the function is specified as deterministic,
   and while the implementation-dependent seed may vary from run to run
   of my stylesheet, it should not vary within the run.

   If you want more than one random number, the idea is to use something
   like

     <xsl;variable name="rmap" as="map(*)"
                   select="random-number-generator()"/>

   for the first call, get your number using $map('number'), and make
   the second call with something like

     <xsl;variable name="rmap2" as="map(*)"
                   select="$map('next')()"/>

   and so on.  The seed to be used by the next call is embedded in the
   function returns as the 'next' member of the map.

 - I can do this as I descend the tree, so that the parent element
   generates the random numbers it needs and then passes the appropriate
   function to its child.

   In that case, I'll get (what I hope is) a nice sequence of random
   numbers as I descend the tree.  But each child is going to get the
   same random-number generation function, which will mean that each
   child is going to get the same random numbers and the simulation will
   show siblings always behaving the same way.  Not a good idea.  I need
   to thread a single sequence of random numbers (and generators)
   through the tree, so that each node uses a different seed and will
   get different random numbers.

 - So maybe I could use an accumulator?

   That will allow each node to get and use a different hidden seed, but
   as far as I can tell, each time slice is going to start its traversal
   of the tree with the same initial function, which means the root of
   the tree is always going to get the same random numbers, and its
   behavior will be the same in every timeslice.

 - So maybe I can produce a different seed for each time I need to get a
   random number, based on which time slice we are in, and which node we
   are processing, and -- since I don't want every run of the simulator
   to produce the same results -- a seed passed in as a parameter (so
   that every run can be passed a different value, and they will produce
   different results).

   Then for each node that needs one or more random numbers, I calculate
   a seed (ideally different for each pass over each node) and call
   fn:random-number-generator() with that seed.

   The results are ... disappointing, and I am hoping someone here can
   help me do better.

If I understand the state of play on conventional random number
generators, for a good one, over time each value between 0 and 1 should
have an approximately equal likelihood of turning up at any given time.

For a given initial seed, however, the result is always going to be the
same (which is good, for reproducing errors and for debugging).

And I notice that if I call the random number generator a hundred times,
using the integers 0 to 99 as seeds, the results I get don't vary much:
the first few digits of the number are often the same.

I have experimented with various ways of combining the number of the
timeslice, the position of the node in the tree, and the seed passed to
the stylesheet as a parameter, and ... the results seem to indicate that
there is not enough variation in the random numbers I am getting.  In
particular, when I calculate the probability of certain results arising,
and compare that to the number of times the simulation produces those
results, they are too far apart for my comfort.  (Choosing a shape for
which the arithmetic is relatively easy, I calculate a 32% chance of the
simulation generating that shape, but consistently get 80% or more
results of that shape.  It's one thing for that to happen now and then,
but it has now happened too often.  My prior belief in the correctness
of my approach is not strong enough to withstand the evidence that the
coin I am flipping is biased in ways I don't know how to control.)

I have tried to pose this question in a general form, to give it broader
interest and applicability; if I have inadvertently elided important
details, feel free to ask -- the details are not secret, just (I hope)
not relevant to the general question.

If anyone can think of a relatively simple way to improve the random
numbers I am generating, I would be glad of any hints.  

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Current Thread