Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Random Seeds in SAMPLE?

Status
Not open for further replies.

unifex

Programmer
Nov 5, 2002
31
GB
I have the joy (!?!?!?) of working in a statistical environment. Unfortunately this means that whenever something comes along claiming to be random colleagues question what random actually means.

So that's what i'm now asking you!!!:

When submitting a query with a SAMPLE clause, what criteria does it use to determine what is random and/or can the user specify a random seed to be used?

I've seen another thread which describes how each amp returns a certain number of rows to generate the random sample but it doesn't give details about how rows are selected.

In other languages (SAS being our main one) we can say something like:

If [Random number] > [percentage] then keep

The random number generator (in SAS) uses a numeric seed which will produce exactly the same sample if run again (this is actually useful for us!). Does Teradata function this way?

I also see there's a RANDOM function - does the SAMPLE clause work in a similar way to this function?

Cheers in advance

Fex
 
There's no seed value.
If you want to reuse the data, materialize it e.g. with a Create Volatile Table as (sel...) with data

If you want a statistically correct sample you'll have to use V2R5:
SAMPLE RANDOMIZED ALLOCATION

SAMPLE probably uses the Random function internally, but if you specify SAMPLE 0.01 the result will be exactly 1 percent. If you use WHERE RANDOM(1,10000) <= 100 it's about 1 percent...

Dieter

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top