I have the joy (!?!?!?) of working in a statistical environment. Unfortunately this means that whenever something comes along claiming to be random colleagues question what random actually means.
So that's what i'm now asking you!!!:
When submitting a query with a SAMPLE clause, what criteria does it use to determine what is random and/or can the user specify a random seed to be used?
I've seen another thread which describes how each amp returns a certain number of rows to generate the random sample but it doesn't give details about how rows are selected.
In other languages (SAS being our main one) we can say something like:
If [Random number] > [percentage] then keep
The random number generator (in SAS) uses a numeric seed which will produce exactly the same sample if run again (this is actually useful for us!). Does Teradata function this way?
I also see there's a RANDOM function - does the SAMPLE clause work in a similar way to this function?
Cheers in advance
Fex
So that's what i'm now asking you!!!:
When submitting a query with a SAMPLE clause, what criteria does it use to determine what is random and/or can the user specify a random seed to be used?
I've seen another thread which describes how each amp returns a certain number of rows to generate the random sample but it doesn't give details about how rows are selected.
In other languages (SAS being our main one) we can say something like:
If [Random number] > [percentage] then keep
The random number generator (in SAS) uses a numeric seed which will produce exactly the same sample if run again (this is actually useful for us!). Does Teradata function this way?
I also see there's a RANDOM function - does the SAMPLE clause work in a similar way to this function?
Cheers in advance
Fex