Random Seeds in SAMPLE?

unifex · Oct 29, 2004

I have the joy (!?!?!?) of working in a statistical environment. Unfortunately this means that whenever something comes along claiming to be random colleagues question what random actually means.

So that's what i'm now asking you!!!:

When submitting a query with a SAMPLE clause, what criteria does it use to determine what is random and/or can the user specify a random seed to be used?

I've seen another thread which describes how each amp returns a certain number of rows to generate the random sample but it doesn't give details about how rows are selected.

In other languages (SAS being our main one) we can say something like:

If [Random number] > [percentage] then keep

The random number generator (in SAS) uses a numeric seed which will produce exactly the same sample if run again (this is actually useful for us!). Does Teradata function this way?

I also see there's a RANDOM function - does the SAMPLE clause work in a similar way to this function?

Cheers in advance

Fex

dnoeth · Oct 29, 2004

There's no seed value.
If you want to reuse the data, materialize it e.g. with a Create Volatile Table as (sel...) with data

If you want a statistically correct sample you'll have to use V2R5:
SAMPLE RANDOMIZED ALLOCATION

SAMPLE probably uses the Random function internally, but if you specify SAMPLE 0.01 the result will be exactly 1 percent. If you use WHERE RANDOM(1,10000) <= 100 it's about 1 percent...

Dieter

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Random Seeds in SAMPLE?

unifex

Programmer

dnoeth

Instructor

Similar threads

Part and Inventory Search

Sponsor