Subject: Re: [xsl] help with random numbers From: "C. M. Sperberg-McQueen cmsmcq@xxxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Thu, 17 Nov 2022 02:25:54 -0000 |
Thanks for your reply. "Michael Kay michaelkay90@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> writes: > ... > > Worth noting in passing that there's a bug in the spec: it says that > every xs:double value in the range 0 to 1 should be equally likely to > appear, but that's not what you actually want, because there are about > as many xs:double values in the range 0 to 0.1 as there are in the > range 0.1 to 1.0. Fortunately implementors are unlikely to have taken > much notice of that provision; but it does illustrate the dangers. You're right! I was so confident that what it said was what it should say that I misread The value of the number entry should be such that all eligible xs:double values are equally likely to be chosen. as saying that all real values in the interval are equally likely to be chosen, i.e. that the values would be selected from a uniform distribution over the interval. At some point I will doubtless also want to select values from other distributions, too, but I suspect that there are ways to make that happen. > If you want a completely uniform distribution, consider using the > permute() option. But of course a uniform distribution is very far > from random (it's well known that a truly random sequence contains far > more duplicates than a typical person will expect). > I think the idea of fn:random-number-generator() is that you can > choose whether you want a repeatable sequence of random numbers or a > different sequence each time. For the latter, use current-dateTime() as > a seed. My situation, I regret to say, is that if possible I want both: I would like a given set of a hundred or a thousand or ten thousand simulation runs to be repeatable, and I would also like each run in such a set to have a different sequence. (And, if possible, I would like each set to vary, in a reproducible way.) But if I use current-dateTime() or some other method to introduce variation into the initial seed, I can of course record the seed used and use it again if I want to replicate the simulations. I suspect that for my current code, which tends to use just the first two values of the sequence of numbers generated by a given seed, I will want more variation in the seed that current-dateTime() will give: a minimally conforming processor has about 3.1E14 different dateTime values (and the simulation runs in a set will vary only in minute, second, and millisecond, so maybe 1E5 different values), which is a lot less than 4.5E18, which the Web tells me is roughly the number of double precision values in the interval [0,1]. (I believe I saw a story once about a flawed online poker system whose card-shuffling routine a 32-bit random number to shuffle the cards, with the result that there were about 2^32 b 4E9 possible hands, instead of 52! b 8e67, and worse yet used milliseconds-since-midnight as the seed, so those who broke the system just needed to try a few hundred or a few thousand clock values to find the value that produced the hand they were holding, at which point they could see everyone else's hand, too. This has led me to believe that -- especially if one is taking the first number generated from a given seed -- it pays if the range of possible seeds is about the same size as the range of possible results of the random number generator, and/or the range of possible phenomena being selected.) > I'm sure you're right that you want a single "flat" sequence of random > numbers, you don't want a branching sequence; and achieving a "flat" > sequence when you're doing recursive tree traversal isn't > straightforward. Using an accumulator is an interesting idea. I've > previously used xsl:number level="any" to index into a sequence of > pre-alllocated random numbers. That seems like a feasible idea: before calling apply-templates for a given time slice I can generate a sequence of random numbers (it's easy to know in advance how many are needed) and index into it in the way you describe. > You seem to be doing all the right things. You say the results are > "disappointing", and I feel I'd like to know more about what that > means. That's a good question. The main symptom so far is that the first hundred times I ran the simulation, 80 of the runs produced the same trivial result: a birth-and-death process in which the initial individual died before reproducing - the runs differed on how long that individual lived, so they were not completely identical, but the family trees they produced were isomorphic: one node with birth and death dates. It is possible, of course, that that form of result is more probable than I had expected. So with a little effort I calculated the probability that the birth and death rates I was using should produce the result. The probability *is* higher than I had expected, but if my calculation is accurate it's about 0.32, not about 0.80. > ... > Specifically: a small change in the seed results in only a small > change in the first value in the sequence. (The Saxon implementation > calls Java's Random class passing a seed which is the hash code of the > seed passed at the XPath level. For integers, the hash code of a small > integer is the integer itself, which may well have something to do > with it.) > > If I multiply the supplied seed by 987654321 before passing it to > Java, the pattern looks a lot more "random": > > ... > > Alternatively, discard the first couple of items in the sequence. An experiment I just ran suggests that discarding two values and taking the third for each key does indeed lead to a much wider spread in the resulting numbers; even discarding a single value helps a lot. Perhaps that will help. Discarding ten appears to be more than is needed. I will try one or more of these approaches and see what happens. If and when the simulation starts to produce results that match the probabilities I am able to calculate for some simple cases (like the one-individual case described above), then I will begin to have more confidence in my simulation. -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] help with random numbers, Michael Kay michaelk | Thread | Re: [xsl] help with random numbers, Dimitre Novatchev dn |
Re: [xsl] help with random numbers, Michael Kay michaelk | Date | Re: [xsl] help with random numbers, Dimitre Novatchev dn |
Month |