Re: [xsl] help with random numbers

Subject: Re: [xsl] help with random numbers
From: "Michael Kay michaelkay90@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 17 Nov 2022 00:35:28 -0000
[Trying again from a new email address, xsl-list is rejecting postings from my
shiny new Office 365 mail account. Probably with good reason...]

Worth noting in passing that there's a bug in the spec: it says that every
xs:double value in the range 0 to 1 should be equally likely to appear, but
that's not what you actually want, because there are about as many xs:double
values in the range 0 to 0.1 as there are in the range 0.1 to 1.0. Fortunately
implementors are unlikely to have taken much notice of that provision; but it
does illustrate the dangers.

If you want a completely uniform distribution, consider using the permute()
option. But of course a uniform distribution is very far from random (it's
well known that a truly random sequence contains far more duplicates than a
typical person will expect).

I think the idea of fn:random-number-generator() is that you can choose
whether you want a repeatable sequence of random numbers or a different
sequece each time. For the latter, use current-dateTime() as a seed.

I'm sure you're right that you want a single "flat" sequence of random
numbers, you don't want a branching sequence; and achieving a "flat" sequence
when you're doing recursive tree traversal isn't straightforward. Using an
accumulator is an interesting idea. I've previously used xsl:number
level="any" to index into a sequence of pre-alllocated random numbers.

You seem to be doing all the right things. You say the results are
"disappointing", and I feel I'd like to know more about what that means.

" I notice that if I call the random number generator a hundred times,
>
> using the integers 0 to 99 as seeds, the results I get don't vary much:
> the first few digits of the number are often the same."

-- that's an interesting result, and not one I would have expected. But I
observe the same effect:

<out xmlns:xsd="http://www.w3.org/2001/XMLSchema
<http://www.w3.org/2001/XMLSchema>">
  <seq seed="1">0.7308781907032909, 0.7257102896080766, 0.9009743392137958,
0.4826910902346161, 0.29315627309697423, </seq>
  <seq seed="2">0.7311469360199058, 0.5517915247420786, 0.7205806183384335,
0.14830673855032783, 0.13395953391975945, </seq>
  <seq seed="3">0.731057369148862, 0.020985659316413385, 0.6164149953316664,
0.45948490983162815, 0.6752260705460803, </seq>
  <seq seed="4">0.7306094602878371, 0.09774842741516032, 0.2556541228947684,
0.8005857957678437, 0.8621923601841234, </seq>
  <seq seed="5">0.730519863614471, 0.07998108603757748, 0.2125018509965486,
0.48858213869922595, 0.6965405606778377, </seq>
  <seq seed="6">0.7307886238322471, 0.31276518053707425, 0.934337937391265,
0.06244780059793276, 0.24460136659747278, </seq>
  <seq seed="7">0.7306990420600421, 0.5281776933716661, 0.02118549810581838,
0.31162842626573717, 0.6340637397326755, </seq>
  <seq seed="8">0.7302511331990172, 0.3987912342492226, 0.7473617064844028,
0.6475229566651487, 0.4049770896786319, </seq>
  <seq seed="9">0.7301615514268123, 0.1749122978283867, 0.4648224333955123,
0.15519236194077113, 0.7151437504769182, </seq>

Specifically: a small change in the seed results in only a small change in the
first value in the sequence. (The Saxon implementation calls Java's Random
class passing a seed which is the hash code of the seed passed at the XPath
level. For integers, the hash code of a small integer is the integer itself,
which may well have something to do with it.)

If I multiply the supplied seed by 987654321 before passing it to Java, the
pattern looks a lot more "random":

<out xmlns:xsd="http://www.w3.org/2001/XMLSchema
<http://www.w3.org/2001/XMLSchema>">
  <seq seed="1">0.3059397179452866, 0.4869093854903047, 0.6639783175664368,
0.7495350431609044, 0.6061717881539085, </seq>
  <seq seed="2">0.6692012803942295, 0.7473753964009529, 0.812854444433305,
0.15951096061541514, 0.540383410407191, </seq>
  <seq seed="3">0.9364004047214378, 0.514819192857016, 0.37644462624058217,
0.578710993412831, 0.8772782385681681, </seq>
  <seq seed="4">0.13658330308329614, 0.49395518455593135, 0.15082479024391615,
0.3545314545482414, 0.535727249283122, </seq>
  <seq seed="5">0.5630469574976565, 0.6790289901736978, 0.418555112475819,
0.8648928938073236, 0.5908563006507326, </seq>
  <seq seed="6">0.8692373437202477, 0.22583738877915016, 0.1907136527639004,
0.9107553550322145, 0.35730532904145773, </seq>
  <seq seed="7">0.32973906482709403, 0.7496030268078676, 0.8384802335047545,
0.38795997991549147, 0.1587306195424475, </seq>
  <seq seed="8">0.06942024208210595, 0.3471030618650712, 0.6978346744515216,
0.5048761583523367, 0.19237692600778022, </seq>

Alternatively, discard the first couple of items in the sequence.

Michael Kay
Saxonica

Current Thread