Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Help: using SAS Macro to create complex data set

Status
Not open for further replies.

NiuYa

Programmer
May 30, 2011
2
US
I have a data set similar to the following simple data set. With the four column of Account, Date, Time and ID, I like to using SAS macro to create a column like "HIT" with 0 and 1 value. The logic is " there is more than 1 of ID used within 2 days from the same account." If the logic is satisfied, then the variable HIT is 1, else is 0.



Account date Time ID hit
387000 6/10/2010 14:00:00 13.15 1
387000 6/9/2010 13:00:00 13.15 0
387000 6/8/2010 12:00:00 13.14 1
387000 6/6/2010 11:00:00 13.15 1
387000 6/5/2010 10:00:00 13.14 1
387000 6/4/2010 8:00:00 13.13 0
387000 6/4/2010 9:00:00 13.13 0
386000 5/9/2010 10:00:00 12.12 0
386000 5/2/2010 9:00:00 12.14 1
386000 5/1/2010 8:00:00 12.13 0
385000 3/3/2010 10:00:00 11.13 1
385000 3/1/2010 8:00:00 11.12 0
385000 3/1/2010 9:00:00 11.12 0



Below is my logic trying to create this variable. But I cannot continue anymore. I would really appreciate someone can help to instruct me out of this puzzle. Thank you very much!


proc sort data=file; by account decending date decending time;run;

%macro test;

proc sql noprint;
select count(distinct account)
into :num_cin
from file;
quit;
%let num_cin=&num_cin;
%put num_cin=&num_cin;

proc sql noprint;
select distinct account
into :CIN1-:CIN&num_cin
from file;
quit;
%put CIN1-:CIN&num_cin ;

**assign date **;
%do i=1 %to &num_cin;

proc sql noprint;
select count(date)
into :num_date
from file
where account=&&cin&i
;
quit;
%let num_date=&num_date;
%put num_date=&num_date &&cin&i;

proc sql noprint;
select date
into :dat1-:dat&num_date
from file a
where account=&&cin&i
;
quit;


%end;
%mend test;
%test;
 
I tried to revised the above example to make it clear.

Account date Time ID hit
387000 6/10/2010 12:00:00 13.15 1
387000 6/9/2010 12:00:00 13.15 1
387000 6/8/2010 12:00:00 13.14 1
387000 6/6/2010 11:00:00 13.15 1
387000 6/5/2010 10:00:00 13.14 1
387000 6/4/2010 9:00:00 13.13 0
387000 6/4/2010 8:00:00 13.13 0
386000 5/9/2010 10:00:00 12.12 0
386000 5/2/2010 9:00:00 12.14 1
386000 5/1/2010 8:00:00 12.13 0
385000 3/3/2010 10:00:00 11.13 1
385000 3/1/2010 8:00:00 11.12 0
385000 3/1/2010 9:00:00 11.12 0

For example,
(1) For account 387000, since on 06/10/2010, the ID used is 13.15, it is different than the ID of 13.14 used on 06/08/2010 (which is within two days of 06/10/2010), so the hit is 1.

(2) For Account 387000, since on 06/09/2010, the ID used is 13.15, it is different than the ID of 13.14 used on 06/08/2010(which is within two days of 06/09/2010), so the hit is 1.

(3)For Account 387000, since on 06/04/2010 at 9:00am, the ID used is 13.13, within 2 days, the only date is another ID of same 13.13 on 06/04/2010 at time of 8:00am. The hit is 0.

(4)For account 386000, within 2 days of 05/09/2010 would be 05/08/2010 or 05/07/2010. But there is no date likes these two, so the hit is 0.

 
Not something that I would try to solve with Macro, but definately better handled with the datastep or SQL.

Here is an approach that uses a cartesian product (compare every record with every other record), and then whittles the list down based on the criteria you mentioned above. We get some discrepancies on which records we have a hit, mine are called x:

Code:
data have(drop =_:) ;
   input Account (_date _Time)(:$10.) ID hit ;
   date = input(_date, anydtdte.) ;
   time = input(_time, anydttme.) ;
   dttime = dhms(date,0,0,time) ;
   format date date9. time time5. dttime datetime20. ;
cards ;
387000 6/10/2010 12:00:00 13.15 1
387000 6/9/2010 12:00:00 13.15 1
387000 6/8/2010 12:00:00 13.14 1
387000 6/6/2010 11:00:00 13.15 1
387000 6/5/2010 10:00:00 13.14 1
387000 6/4/2010 9:00:00 13.13 0
387000 6/4/2010 8:00:00 13.13 0
386000 5/9/2010 10:00:00 12.12 0
386000 5/2/2010 9:00:00 12.14 1
386000 5/1/2010 8:00:00 12.13 0
385000 3/3/2010 10:00:00 11.13 1
385000 3/1/2010 8:00:00 11.12 0
385000 3/1/2010 9:00:00 11.12 0
;;;
proc sql ;
create table want(drop=n dttime) as 
   select h1.*, monotonic() as n, (h2.id ne h1.id and h2.id is not null) as x
   from have h1 left join have h2
   on 0<(abs(h1.dttime-h2.dttime)/(60*60*24))<= 2 
   and h1.account = h2.account
   group by h1.account, h1.dttime, h1.id
   having max(x) =x and min(n) =n;
quit; 
proc print; run;

output
Code:
 Obs    Account      ID     hit         date     time    x

 1     385000    11.12     0     01MAR2010     8:00    0
 2     385000    11.12     0     01MAR2010     9:00    0
 3     385000    11.13     1     03MAR2010    10:00    0
 4     386000    12.13     0     01MAY2010     8:00    1
 5     386000    12.14     1     02MAY2010     9:00    1
 6     386000    12.12     0     09MAY2010    10:00    0
 7     387000    13.14     1     05JUN2010    10:00    1
 8     387000    13.15     1     06JUN2010    11:00    1
 9     387000    13.14     1     08JUN2010    12:00    1
10     387000    13.15     1     09JUN2010    12:00    1
11     387000    13.15     1     10JUN2010    12:00    1
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top