Re: [xquery-talk] [xsl] Re: Random number generation : requirements

Subject: Re: [xquery-talk] [xsl] Re: Random number generation : requirements
From: "Michael Sokolov msokolov@xxxxxxxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 6 May 2014 23:16:08 -0000
My 2c:

I used an XQuery function based on Dmitry's version before; it works fine although it's a little inconvenient to have to keep passing in the prior value.

I would say the most convenient (or at least the most familiar) signature for a random function is random($n) returning a random number between 0 inclusive and $n exclusive; ideally it would return integers if $n is an integer, floating point numbers if $n is a floating point number, empty if $n is empty ? and an error otherwise. And I would like a seed function. Ideally this should be callable many times: I'm not sure how that could be done non-deterministically though.

I suppose a sequence would be useful, but it isn't the first thing that leaps to mind. What if I'm not sure how many I'll need?

For example, one use case for me was to load a huge amount of data, and only include 1% of it, in order to generate a predictable test data sub-set. I want to write an XSLT template that returns nothing 99% of the time, and for the other 1% of the time it processed the content normally. I want this to be based on an identifier in the content so that for a given seed, the same "random" 1% are selected each time: it should *not* be order-dependent, rather I would like to seed the random number generator with a hash of a given seed that is a configuration parameter, and a node-identifier, and then evaluate the next random number to see if it is > 0.01 (say). Maybe there are other ways to do that, but that is what I did using Java.

-Mike


On 5/6/2014 6:58 PM, Michael Kay wrote:
The big problem with a nondeterministic random() function is not defining the order of execution, but preventing it being optimised out of a loop. For example, how do we ensure that

$xxx[random() gt 0.5]

doesn't select either all the values or none?

Anyway, we're not planning to do non-determinism. This exercise is about designing a deterministic way to meet the requirement.

Michael Kay
Saxonica

On 6 May 2014, at 23:48, Michael Sokolov <msokolov@xxxxxxxxxxxxxxxxxxxxx> wrote:

On 5/6/2014 6:41 PM, Michael Kay mike@xxxxxxxxxxxx wrote:
My policy on side effects is: all expressions containing side effects are going to be evaluated in order

I do something like that in Saxon as well. But I don't attempt to define what "in order" means; for example, the order in which different global variables are evaluated. Doing this in the spec would be much more problematic.

You don't think it would be reasonable to say something to the effect that the order in which non-deterministic expressions are evaluated is non-deterministic (ie implementation-defined)? Certainly it would be reasonable enough in the case of a random number generator. Although I suppose if you are going to seed it, you would like the seed to effect the random numbers that are generated.

-Mike
_______________________________________________
talk@xxxxxxxxxxx
http://x-query.com/mailman/listinfo/talk

Current Thread