Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Removing duplicates 2

Status
Not open for further replies.

cosmid

Programmer
Feb 14, 2008
73
US
Wow! I can't believe how hard this is with SAS. How do you remove all duplicated rows?

For example, if I have the following data set:

OBS Letter
1 A
2 A
3 A
4 B
5 C
6 D

I want a data set that removes all the As. Like the following:

OBS Letter
1 B
2 C
3 D

This should be something easy to do. But I find it almost impossible with SAS.
 
I have to write it using SQL to get the result I wanted.

proc sql;
create table no_duplicates as
select count(*) as total, Letter
from mydata
group by Letter
having total < 2;

I guess I was looking for a function that can do this. The SAS nodup in proc sort doesn't do it. It works almost like distinct from SQL. It does not remove the duplicated records and only print them out once or write them out once to the file.
 
Hi Cosmid,

That is the standard way to handle it in SQL, but I find the datastep solution even simpler

Code:
data have;
input letter$ @@;
cards;
A A A B C D
;run;

data dstep;
   set have;
   by letter;
   if first.letter and last.letter;
   run;
proc print;run;

You could of course also handle this with a proc means/summary to get the frequencies, and use a where statement to get the unique records. It does mean you have to keep the _freq_ variable in your dataset, but you can easily drop that on any reads.

Code:
proc summary data=have nway;
   class letter;
   output out=psummary(where=(_freq_ eq 1));
   run;
proc print;run;

HTH
 
wow! thanks! that if statement is great. and I never knew how to use the _freq_ variable. Thanks a lot!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top