Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

random help 1

Status
Not open for further replies.

sasuser2006

Technical User
May 8, 2006
32
US
I am needing help randomizing a data set in SAS. What i want to do is read in a data set and if you meet a certain criteria, select a random sample to put in one bucket and put the rest in another bucket.

Say 100 records have x > 10, i want a random 60 put in one data set and the rest put in another set.

data input; set random1;
if x > 10;
rand=ranuni(60);
proc sort data=input out=random1;
by rand;
run;

I know this isn't correct but can somebody help me modify this so it will achieve what I'm looking for? Thanks all.

 
I usually do this:-
Code:
data sample_stage1;
  set full_data(where=(x>10));
 
  random_number = ranuni(54);
run;

proc sort data=sample_stage1;
  by random_number;
run;

data split1
     split2
     residue;
  set sample_stage1;

  * this way:- *;
  if _n_ <= 60 then output split1;
  else output split2;
  * OR this way:- *;
  if _n_ <= 60 then output split1;
  else if 60 < _n_ <= 100 then output split2;
  else output residue;
run;
 
GREAT! Thanks a lot but have a couple of questions.

Why did you set the full_data but sort the sample_stage1?

Why did you choose ranuni(54) as opposed to ranuni(60) or another number?
 
Also, forgot to ask...say I don't know how many records meet this criteria but do know I want 50% to go one way and 50% to go another. Is there a way to change...

if _n_ <= 60 then output split1;
else output split2;

to account for percentages?

i.e.

if _n_ <=50% then output split1;
else output split2;

Thanks again.
 
OK.
1 - I didn't set the full dataset, there's a where clause on the SET statement to cut it down. It's generally adviseable to not sort more than is necessary. A where clause on the SET statement is more efficient than an if statement in the step as this is evaluated before reading the whole record in.
2 - The number in ranuni was just random key strokes. It doesn't matter what the seed is really.
3 - If you just want a 50/50 split you can count records:-
Code:
data split1
     split2;
  set sample_stage1;

  retain counter 0;
  counter + 1;
  if counter=1 then output split1;
  if counter=2 then do;
    output split2;
    counter = 0;
  end;
run;
There is a more elegant way of doing this using a function which checks the remainder when dividing by a number (can't remember it off the top of my head) however this way I think is clearer, and also makes it ridiculously easy to expand out to split as many ways as you want.
If you want a 2:1 split, you can split it into 3, then stick the first 2 together etc.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top