Dieter, thank you for your reply and I understand what you are saying. However, I had heard from a reputable consultant that sample is simply returning the first x records from a given Amp that meet the query's criteria. Each time you run the sample query, Teradata is just selecting a different Amp, but still taking the FIRST x obs. Thus, not truly random. Any thoughts?
Just submit a
SEL PI_col, HASHAMP(HASHBUCKET(HASHROW(PI_col)))
FROM tab
SAMPLE xx
and you'll see that there are rows from every AMP.
The requested sample size is divided by the number of rows and each AMP retrieves a part of the sample.
If you use SAMPLE RANDOMIZED ALLOCATION in V2R5 the you'll get an even better sample, but you need to be a statistician to recognize that ;-)
Dieter
Running on the demo version with 2 AMPs:
SEL "AMP", count(*)
FROM
(
SEL HASHAMP(HASHBUCKET(HASHROW(cdate))) AS "AMP"
FROM sys_calendar.caldates
SAMPLE 1000
) dt
GROUP BY 1;
*** Query completed. 2 rows found. 2 columns returned.
*** Total elapsed time was 1 second.
AMP Count(*)
----------- -----------
0 500
1 500
SEL "AMP", count(*)
FROM
(
SEL HASHAMP(HASHBUCKET(HASHROW(cdate))) AS "AMP"
FROM sys_calendar.caldates
SAMPLE RANDOMIZED ALLOCATION 1000
) dt
GROUP BY 1;
*** Query completed. 2 rows found. 2 columns returned.
*** Total elapsed time was 1 second.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.