Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Merge by date range, not exact date 1

Status
Not open for further replies.

jonnysnow

Programmer
Apr 10, 2003
36
US
I have one dataset (HOSP) with hospitalization data and another (EVENT) with procedure data. How can I merge these two so that I get a match whenever a procedure was done during a matching hospitalization? The problem is that the data in EVENT may not exactly match any field in HOSP.

Example:
Code:
*****  HOSP  *****

        ID         start            end
1)      099      04/12/2000      04/15/2000
2)      157      02/28/2000      03/01/2000
3)      157      05/15/2001      06/04/2001   (A)
4)      157      08/12/2001      08/20/2001
5)      157      10/03/2001      10/04/2001   (B)
6)      228      03/12/1997      03/15/1997
7)      228      09/12/2001      09/18/2001   (C)


*****  EVENT  *****

        ID       procedure        date
1)      099      surgery         07/18/1998
2)      157      biopsy          05/20/2001   (A)
3)      157      x-ray           06/02/2001   (A)
4)      157      blood work      06/02/2001   (A)
5)      157      MRI             09/17/2001
6)      157      surgery         10/03/2001   (B)
7)      228      surgery         09/14/2001   (C)
8)      228      blood work      11/29/2002

So I want my output to have 5 records, merging (A) in HOSP with each (A) in EVENT, (B) with (B) and (C) with (C).

One attempt didn’t work-- I tried to sort HOSP by “ID” and “start” and EVENT by “ID” and “date”, then merge by ID only (since the dates won’t necessarily be an exact match).

Any thoughts? Thanks!
 
Here is my solution:

data first;
input tempa $ id $ start :mmddyy10. end :mmddyy10.;
format start end date9.;
datalines;
1) 099 04/12/2000 04/15/2000
2) 157 02/28/2000 03/01/2000
3) 157 05/15/2001 06/04/2001
4) 157 08/12/2001 08/20/2001
5) 157 10/03/2001 10/04/2001
6) 228 03/12/1997 03/15/1997
7) 228 09/12/2001 09/18/2001
;

data second;
input tempb $ id $ procedure &$15. date :mmddyy10.;
format date date9.;
datalines;
1) 099 surgery 07/18/1998
2) 157 biopsy 05/20/2001
3) 157 x-ray 06/02/2001
4) 157 blood work 06/02/2001
5) 157 MRI 09/17/2001
6) 157 surgery 10/03/2001
7) 228 surgery 09/14/2001
8) 228 blood work 11/29/2002
;
run;

proc sql;
create table combined as
select a.id, b.procedure, a.start, b.date, a.end, a.tempa, b.tempb
from first as a inner join second as b
on a.id = b.id and b.date between a.start and a.end;
quit;
proc print; run;
 
teralearner's solution is a good one!

If you need a more traditional SAS solution (I have some programmers here who do not like proc sql - sigh - I will convert them some time) you could do something like this (probably depends on shop standards a bit - or personal preference).

(you read them in how you described)

Proc sort data=Hosp;
by ID;

proc sort data=Event;
by ID;

Data HospProcedures;
merge Hosp(in=GotHospRec) Event(in=GotEvent);
by ID;
if GotHospRec & GotEvent
Then
If Start <= Date & Date <= End Then Output;

Run;


now you have two options
 
Thanks! I have good news and bad news. The good news is that I tried teralearner's solution and it worked beautifully. The bad news is that I tried jymm's solution and-- no dice.

jymm, your solution seems to have the same problem as my first attempt. Apparently, when you're merging by ID only, SAS ignores all other fields and merges one by one in sequence. This is NOT what I want. Here's what I think SAS does conceptually with your program and my attempt:

[tt] merge records
HOSP EVENT in date
Iteration rec # rec # ID range?

1 1 to 1 099 no

2 2 to 2 157 no
3 3 to 3 157 YES
4 4 to 4 157 no **
5 5 to 5 157 no
6 5 to 6 157 YES

7 6 to 7 228 no
8 7 to 8 228 no[/tt]

** Notice that at iteration 4, it's comparing records 4 to 4, not 3 to 4, as we humans would do. So the dataset ends up with only two records.

So, where do I learn more about PROC SQL? (I found it in my SAS Procedures Guide, Version 6, Third Edition, 1990-- surprise, surprise! Is this documentation OK, or is there a better reference?)
 
If you merge without the BY statement you will get the behavior that you describe. Interesting that you ran into this though.

With Proc SQL you can pretty much do anything that you can do with SQL.

I used 'Professional SAS Programming Shortcuts' by Rick Aster to get my headstart on that one... it has a bunch of other info to.
 
Here is a paper on proc sql:

You problem is a many to many merge, since id variable is not unique in either of the data sets. There is no one step solution with conventional merge for many to many merge. If you want to do it in data step, you need to create many data sets from one of your data sets, within each of them ID should be unique. Merge every newly created data set with the other dataset. Put the merged data sets back together. You can achieve this with a macro.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top