Re: [xsl] help with random numbers

Subject: Re: [xsl] help with random numbers
From: "Dimitre Novatchev dnovatchev@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 17 Nov 2022 02:57:04 -0000
I hope this information might be useful:
https://fxsl.sourceforge.net/articles/Random/Casting%20the%20Dice%20with%20FX
SL-htm.htm#3._Testing_randomness_with_Monte_Carlo_integration

A  test with 65536 random points is carried out using FXSL (XSLT 1.0) ,
that calculates with the Monte-Carlo method the values of the integrals of
3 well-known functions:

f x = 4 / (1 + x^2),    x  *b* [0, 1]

f x = x, x  *b* [0, 1]


f x = 1 / x,   x  *b* [1, 2]

The well-known results of the integration:  pi, B=  and ln(2)  were
calculated with good precision.

This was done and published more than 20 years ago and is still usable  :)

Thanks,
Dimitre




On Wed, Nov 16, 2022 at 6:25 PM C. M. Sperberg-McQueen
cmsmcq@xxxxxxxxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

>
> Thanks for your reply.
>
> "Michael Kay michaelkay90@xxxxxxxxx" <
> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> writes:
>
> > ...
> >
> > Worth noting in passing that there's a bug in the spec: it says that
> > every xs:double value in the range 0 to 1 should be equally likely to
> > appear, but that's not what you actually want, because there are about
> > as many xs:double values in the range 0 to 0.1 as there are in the
> > range 0.1 to 1.0. Fortunately implementors are unlikely to have taken
> > much notice of that provision; but it does illustrate the dangers.
>
> You're right!  I was so confident that what it said was what it should
> say that I misread
>
>     The value of the number entry should be such that all eligible
>     xs:double values are equally likely to be chosen.
>
> as saying that all real values in the interval are equally likely to be
> chosen, i.e. that the values would be selected from a uniform
> distribution over the interval.
>
> At some point I will doubtless also want to select values from other
> distributions, too, but I suspect that there are ways to make that
> happen.
>
> > If you want a completely uniform distribution, consider using the
> > permute() option. But of course a uniform distribution is very far
> > from random (it's well known that a truly random sequence contains far
> > more duplicates than a typical person will expect).
>
> > I think the idea of fn:random-number-generator() is that you can
> > choose whether you want a repeatable sequence of random numbers or a
> > different sequence each time. For the latter, use current-dateTime() as
> > a seed.
>
> My situation, I regret to say, is that if possible I want both: I would
> like a given set of a hundred or a thousand or ten thousand simulation
> runs to be repeatable, and I would also like each run in such a set to
> have a different sequence.  (And, if possible, I would like each set to
> vary, in a reproducible way.)
>
> But if I use current-dateTime() or some other method to introduce
> variation into the initial seed, I can of course record the seed used
> and use it again if I want to replicate the simulations.  I suspect that
> for my current code, which tends to use just the first two values of the
> sequence of numbers generated by a given seed, I will want more
> variation in the seed that current-dateTime() will give: a minimally
> conforming processor has about 3.1E14 different dateTime values (and the
> simulation runs in a set will vary only in minute, second, and
> millisecond, so maybe 1E5 different values), which is a lot less than
> 4.5E18, which the Web tells me is roughly the number of double precision
> values in the interval [0,1].
>
> (I believe I saw a story once about a flawed online poker system whose
> card-shuffling routine a 32-bit random number to shuffle the cards, with
> the result that there were about 2^32 b	 4E9 possible hands, instead of
> 52! b	 8e67, and worse yet used milliseconds-since-midnight as the seed,
> so those who broke the system just needed to try a few hundred or a few
> thousand clock values to find the value that produced the hand they were
> holding, at which point they could see everyone else's hand, too.  This
> has led me to believe that -- especially if one is taking the first
> number generated from a given seed -- it pays if the range of possible
> seeds is about the same size as the range of possible results of the
> random number generator, and/or the range of possible phenomena being
> selected.)
>
>
> > I'm sure you're right that you want a single "flat" sequence of random
> > numbers, you don't want a branching sequence; and achieving a "flat"
> > sequence when you're doing recursive tree traversal isn't
> > straightforward. Using an accumulator is an interesting idea. I've
> > previously used xsl:number level="any" to index into a sequence of
> > pre-alllocated random numbers.
>
> That seems like a feasible idea:  before calling apply-templates for a
> given time slice I can generate a sequence of random numbers (it's easy
> to know in advance how many are needed) and index into it in the way you
> describe.
>
> > You seem to be doing all the right things. You say the results are
> > "disappointing", and I feel I'd like to know more about what that
> > means.
>
> That's a good question.  The main symptom so far is that the first
> hundred times I ran the simulation, 80 of the runs produced the same
> trivial result: a birth-and-death process in which the initial
> individual died before reproducing - the runs differed on how long that
> individual lived, so they were not completely identical, but the family
> trees they produced were isomorphic: one node with birth and death
> dates.
>
> It is possible, of course, that that form of result is more probable
> than I had expected.  So with a little effort I calculated the
> probability that the birth and death rates I was using should produce
> the result.  The probability *is* higher than I had expected, but if my
> calculation is accurate it's about 0.32, not about 0.80.
>
> > ...
> > Specifically: a small change in the seed results in only a small
> > change in the first value in the sequence. (The Saxon implementation
> > calls Java's Random class passing a seed which is the hash code of the
> > seed passed at the XPath level. For integers, the hash code of a small
> > integer is the integer itself, which may well have something to do
> > with it.)
> >
> > If I multiply the supplied seed by 987654321 before passing it to
> > Java, the pattern looks a lot more "random":
> >
> > ...
> >
> > Alternatively, discard the first couple of items in the sequence.
>
> An experiment I just ran suggests that discarding two values and taking
> the third for each key does indeed lead to a much wider spread in the
> resulting numbers; even discarding a single value helps a lot.  Perhaps
> that will help.  Discarding ten appears to be more than is needed.
>
> I will try one or more of these approaches and see what happens.  If and
> when the simulation starts to produce results that match the
> probabilities I am able to calculate for some simple cases (like the
> one-individual case described above), then I will begin to have more
> confidence in my simulation.
>
> --
> C. M. Sperberg-McQueen
> Black Mesa Technologies LLC
> http://blackmesatech.com

Current Thread