Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Random Number Problem

Status
Not open for further replies.

abutony

Programmer
Jul 23, 2007
3
CA
Hi there,

I am trying to get 200 (records) sample data from 50,000 records. I am using the RDNORM function but it's not giving me the random data, instead the output is in chronological order. Here's the code:


TABLE FILE BAN
PRINT
BAN
BAN_STATUS
BAN_TYPE
BAN_AR_BALANCE

COMPUTE RANDOM/D10.2=RDNORM(RANDOM);

BY BAN_BAN
WHERE READLIMIT EQ 50000
ON TABLE HOLD AS 'RDM2A'
END




TABLE FILE RDM2A
PRINT
BAN
BAN_STATUS
BAN_TYPE
BAN_AR_BALANCE
RANDOM

BY BAN_BAN
WHERE RANDOM LT 0
WHERE RECORDLIMIT EQ 100
ON TABLE PCHOLD FORMAT EXCEL
END

Any help would be much appreciated. Thank You.
 
First off, understand that RDNORM returns a random number, such that ALL the random numbers are in a 'normal' (bell shaped) distribution. The values returned are usually between 3 and -3, and the mean of all the numbers is 0, with a standard deviation of 1. This is used primarily for statistical analysis.

For random selection, use the function RDUNIF, which returns a value which is uniformly distributed (all values are equally likely), having values between 0 and 1.

Now, if you want approximately 200 records, you can do the following, in ONE pass:

1. what's the ratio of the number of records desired to the total population? Here, it's 200 out of 50000, or .004.

2. Calculate RDUNIF for each record (as you've done), but ONLY accept records where the result is LE .004

The problem with wanting EXACTLY 200 records is that, then the selection is no longer random (independent). Whether a record gets selected is not only a function of randomness, but also how many records you've already retrieved.

However, if THAT'S what you want, you can do it, but it will take two passes, which you're already doing.

First, calculate the value for RDUNIF with a DEFINE, rather than a COMPUTE. In your first pass, retrieve EVERYTHING, but SORT it by the DEFINEd random number. In your second pass, use RECORDLIMIT to retrieve the first 200 records.

The code might look like this:
Code:
[red]DEFINE FILE BAN
RANDOM/D10.2 [u]WITH BAN[/u]=RDUNIF(RANDOM);
END[/red]

TABLE FILE BAN
PRINT
BAN
BAN_STATUS
BAN_TYPE
BAN_AR_BALANCE
BAN_BAN
[red]BY RANDOM[/red]
WHERE READLIMIT EQ 50000
ON TABLE HOLD AS 'RDM2A'
END

TABLE FILE RDM2A
PRINT
BAN
BAN_STATUS
BAN_TYPE
BAN_AR_BALANCE
RANDOM
BY BAN_BAN
[red]WHERE RECORDLIMIT EQ 200[/red]
ON TABLE PCHOLD FORMAT EXCEL
END

Note the use of 'WITH' in the DEFINE. This says to calculate a new value every time you get a different value for the specified field. It should be at the lowest level of the retrieved structure. If the structure is a flat single segment, then ANY field will do.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top