Subject: Re: [xsl] help with random numbers From: "Dimitre Novatchev dnovatchev@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Thu, 17 Nov 2022 02:57:04 -0000 |
I hope this information might be useful: https://fxsl.sourceforge.net/articles/Random/Casting%20the%20Dice%20with%20FX SL-htm.htm#3._Testing_randomness_with_Monte_Carlo_integration A test with 65536 random points is carried out using FXSL (XSLT 1.0) , that calculates with the Monte-Carlo method the values of the integrals of 3 well-known functions: f x = 4 / (1 + x^2), x *b* [0, 1] f x = x, x *b* [0, 1] f x = 1 / x, x *b* [1, 2] The well-known results of the integration: pi, B= and ln(2) were calculated with good precision. This was done and published more than 20 years ago and is still usable :) Thanks, Dimitre On Wed, Nov 16, 2022 at 6:25 PM C. M. Sperberg-McQueen cmsmcq@xxxxxxxxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > Thanks for your reply. > > "Michael Kay michaelkay90@xxxxxxxxx" < > xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> writes: > > > ... > > > > Worth noting in passing that there's a bug in the spec: it says that > > every xs:double value in the range 0 to 1 should be equally likely to > > appear, but that's not what you actually want, because there are about > > as many xs:double values in the range 0 to 0.1 as there are in the > > range 0.1 to 1.0. Fortunately implementors are unlikely to have taken > > much notice of that provision; but it does illustrate the dangers. > > You're right! I was so confident that what it said was what it should > say that I misread > > The value of the number entry should be such that all eligible > xs:double values are equally likely to be chosen. > > as saying that all real values in the interval are equally likely to be > chosen, i.e. that the values would be selected from a uniform > distribution over the interval. > > At some point I will doubtless also want to select values from other > distributions, too, but I suspect that there are ways to make that > happen. > > > If you want a completely uniform distribution, consider using the > > permute() option. But of course a uniform distribution is very far > > from random (it's well known that a truly random sequence contains far > > more duplicates than a typical person will expect). > > > I think the idea of fn:random-number-generator() is that you can > > choose whether you want a repeatable sequence of random numbers or a > > different sequence each time. For the latter, use current-dateTime() as > > a seed. > > My situation, I regret to say, is that if possible I want both: I would > like a given set of a hundred or a thousand or ten thousand simulation > runs to be repeatable, and I would also like each run in such a set to > have a different sequence. (And, if possible, I would like each set to > vary, in a reproducible way.) > > But if I use current-dateTime() or some other method to introduce > variation into the initial seed, I can of course record the seed used > and use it again if I want to replicate the simulations. I suspect that > for my current code, which tends to use just the first two values of the > sequence of numbers generated by a given seed, I will want more > variation in the seed that current-dateTime() will give: a minimally > conforming processor has about 3.1E14 different dateTime values (and the > simulation runs in a set will vary only in minute, second, and > millisecond, so maybe 1E5 different values), which is a lot less than > 4.5E18, which the Web tells me is roughly the number of double precision > values in the interval [0,1]. > > (I believe I saw a story once about a flawed online poker system whose > card-shuffling routine a 32-bit random number to shuffle the cards, with > the result that there were about 2^32 b 4E9 possible hands, instead of > 52! b 8e67, and worse yet used milliseconds-since-midnight as the seed, > so those who broke the system just needed to try a few hundred or a few > thousand clock values to find the value that produced the hand they were > holding, at which point they could see everyone else's hand, too. This > has led me to believe that -- especially if one is taking the first > number generated from a given seed -- it pays if the range of possible > seeds is about the same size as the range of possible results of the > random number generator, and/or the range of possible phenomena being > selected.) > > > > I'm sure you're right that you want a single "flat" sequence of random > > numbers, you don't want a branching sequence; and achieving a "flat" > > sequence when you're doing recursive tree traversal isn't > > straightforward. Using an accumulator is an interesting idea. I've > > previously used xsl:number level="any" to index into a sequence of > > pre-alllocated random numbers. > > That seems like a feasible idea: before calling apply-templates for a > given time slice I can generate a sequence of random numbers (it's easy > to know in advance how many are needed) and index into it in the way you > describe. > > > You seem to be doing all the right things. You say the results are > > "disappointing", and I feel I'd like to know more about what that > > means. > > That's a good question. The main symptom so far is that the first > hundred times I ran the simulation, 80 of the runs produced the same > trivial result: a birth-and-death process in which the initial > individual died before reproducing - the runs differed on how long that > individual lived, so they were not completely identical, but the family > trees they produced were isomorphic: one node with birth and death > dates. > > It is possible, of course, that that form of result is more probable > than I had expected. So with a little effort I calculated the > probability that the birth and death rates I was using should produce > the result. The probability *is* higher than I had expected, but if my > calculation is accurate it's about 0.32, not about 0.80. > > > ... > > Specifically: a small change in the seed results in only a small > > change in the first value in the sequence. (The Saxon implementation > > calls Java's Random class passing a seed which is the hash code of the > > seed passed at the XPath level. For integers, the hash code of a small > > integer is the integer itself, which may well have something to do > > with it.) > > > > If I multiply the supplied seed by 987654321 before passing it to > > Java, the pattern looks a lot more "random": > > > > ... > > > > Alternatively, discard the first couple of items in the sequence. > > An experiment I just ran suggests that discarding two values and taking > the third for each key does indeed lead to a much wider spread in the > resulting numbers; even discarding a single value helps a lot. Perhaps > that will help. Discarding ten appears to be more than is needed. > > I will try one or more of these approaches and see what happens. If and > when the simulation starts to produce results that match the > probabilities I am able to calculate for some simple cases (like the > one-individual case described above), then I will begin to have more > confidence in my simulation. > > -- > C. M. Sperberg-McQueen > Black Mesa Technologies LLC > http://blackmesatech.com
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] help with random numbers, C. M. Sperberg-McQue | Thread | Re: [xsl] help with random numbers, Michael Kay michaelk |
Re: [xsl] help with random numbers, C. M. Sperberg-McQue | Date | Re: [xsl] help with random numbers, Michael Kay michaelk |
Month |