Re: [xquery-talk] [xsl] Re: Random number generation : requirements

Subject: Re: [xquery-talk] [xsl] Re: Random number generation : requirements
From: "Wolfgang Laun wolfgang.laun@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 7 May 2014 06:31:26 -0000
I think that a random() in XSLT should be provided in a way that lets
you call several random number generators (of the same kind) in
parallel. Generators may exhibit a big difference between a sequence
where all elements are due to successive calls of the same generator
and one where a sufficient number of generators is called one by one.

For instance: In Dimitre's example: values returned alternate between
even and odd, and using this to generate random points (x,y) in 2D
omits 50% of the possible points. And this is typical for an entire
class of random generators.

-W


On 07/05/2014, Michael Sokolov msokolov@xxxxxxxxxxxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> My 2c:
>
> I used an XQuery function based on Dmitry's version before; it works
> fine although it's a little inconvenient to have to keep passing in the
> prior value.
>
> I would say the most convenient (or at least the most familiar)
> signature for a random function is random($n) returning a random number
> between 0 inclusive and $n exclusive; ideally it would return integers
> if $n is an integer, floating point numbers if $n is a floating point
> number, empty if $n is empty ? and an error otherwise.  And I would like
> a seed function.  Ideally this should be callable many times: I'm not
> sure how that could be done non-deterministically though.
>
> I suppose a sequence would be useful, but it isn't the first thing that
> leaps to mind.  What if I'm not sure how many I'll need?
>
> For example, one use case for me was to load a huge amount of data, and
> only include 1% of it, in order to generate a predictable test data
> sub-set. I want to write an XSLT template that returns nothing 99% of
> the time, and for the other 1% of the time it processed the content
> normally.  I want this to be based on an identifier in the content so
> that for a given seed, the same "random" 1% are selected each time: it
> should *not* be order-dependent, rather I would like to seed the random
> number generator with a hash of a given seed that is a configuration
> parameter, and a node-identifier, and then evaluate the next random
> number to see if it is > 0.01 (say).  Maybe there are other ways to do
> that, but that is what I did using Java.
>
> -Mike
>
>
> On 5/6/2014 6:58 PM, Michael Kay wrote:
>> The big problem with a nondeterministic random() function is not defining
>> the order of execution, but preventing it being optimised out of a loop.
>> For example, how do we ensure that
>>
>> $xxx[random() gt 0.5]
>>
>> doesn't select either all the values or none?
>>
>> Anyway, we're not planning to do non-determinism. This exercise is about
>> designing a deterministic way to meet the requirement.
>>
>> Michael Kay
>> Saxonica
>>
>> On 6 May 2014, at 23:48, Michael Sokolov <msokolov@xxxxxxxxxxxxxxxxxxxxx>
>> wrote:
>>
>>> On 5/6/2014 6:41 PM, Michael Kay mike@xxxxxxxxxxxx wrote:
>>>>> My policy on side effects is: all expressions containing side effects
>>>>> are going to be evaluated in order
>>>>>
>>>> I do something like that in Saxon as well. But I don't attempt to define
>>>> what "in order" means; for example, the order in which different global
>>>> variables are evaluated. Doing this in the spec would be much more
>>>> problematic.
>>>>
>>> You don't think it would be reasonable to say something to the effect
>>> that the order in which non-deterministic expressions are evaluated is
>>> non-deterministic (ie implementation-defined)? Certainly it would be
>>> reasonable enough in the case of a random number generator.  Although I
>>> suppose if you are going to seed it, you would like the seed to effect
>>> the random numbers that are generated.
>>>
>>> -Mike
>>> _______________________________________________
>>> talk@xxxxxxxxxxx
>>> http://x-query.com/mailman/listinfo/talk

Current Thread