Removing duplicates 2

cosmid · Jan 23, 2009

Wow! I can't believe how hard this is with SAS. How do you remove all duplicated rows?

For example, if I have the following data set:

OBS Letter
1 A
2 A
3 A
4 B
5 C
6 D

I want a data set that removes all the As. Like the following:

OBS Letter
1 B
2 C
3 D

This should be something easy to do. But I find it almost impossible with SAS.

cosmid · Jan 23, 2009

I have to write it using SQL to get the result I wanted.

proc sql;
create table no_duplicates as
select count(*) as total, Letter
from mydata
group by Letter
having total < 2;

I guess I was looking for a function that can do this. The SAS nodup in proc sort doesn't do it. It works almost like distinct from SQL. It does not remove the duplicated records and only print them out once or write them out once to the file.

kdt82 · Jan 24, 2009

Hi Cosmid,

That is the standard way to handle it in SQL, but I find the datastep solution even simpler

Code:

data have;
input letter$ @@;
cards;
A A A B C D
;run;

data dstep;
   set have;
   by letter;
   if first.letter and last.letter;
   run;
proc print;run;

You could of course also handle this with a proc means/summary to get the frequencies, and use a where statement to get the unique records. It does mean you have to keep the _freq_ variable in your dataset, but you can easily drop that on any reads.

Code:

proc summary data=have nway;
   class letter;
   output out=psummary(where=(_freq_ eq 1));
   run;
proc print;run;

HTH

cosmid · Jan 26, 2009

wow! thanks! that if statement is great. and I never knew how to use the _freq_ variable. Thanks a lot!

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Removing duplicates 2

cosmid

Programmer

cosmid

Programmer

kdt82

Programmer

cosmid

Programmer

Similar threads

Part and Inventory Search

Sponsor