Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Merging in DO LOOP??

Status
Not open for further replies.

wphupkes

Technical User
Jul 1, 2008
22
NL
Hello all,

I'm a newbee at SAS, and I'm struggling with the following problem.

I've created 1 table with dates, like

2008-01-01
2008-01-02
2008-01-03
etc.

And I've got another table with customer id's, incident dates and incident id's, like

Customer_id | Incident dates
123 | 2008-01-02
123 | 2008-01-05
126 | 2008-01-05
128 | 2008-05-03
128 | 2008-05-28
128 | 2008-06-01
etc.

What I want to create is a list per Customer ID, which overlays the first dates table (sort of outer join, but per customer), so the dates table stays intact (I try the merge statement, but this don't leaves the dates table intact). So this will create a huge table, because all the customers have there own list (I like to have all lists below each other. Like:

For Customer ID 123 the list will be:

2008-01-01 |
2008-01-02 | 2008-01-02
2008-01-03 |
2008-01-04 |
2008-01-05 | 2008-02-05
2008-01-06 |
Then, the following customer ID and so on.

I tried several hours to make this work (DO Loops, merges, etc), but I can't figur it out. Maybe anyone can help me? Really thanks in advance!!!
 
Hi Duckman,
It might help us to know exactly what you're trying to achieve, ie, what is the purpose of creating this table. It seems like a really odd thing to want to do, so it might be that there is a completely different way to achieve this without creating a monster table like the one you are currently trying to build.
I'm guessing you want to end up with a table that looks like this:-
Customer_id | Date | Incident dates
123 | 2008-01-01 |
123 | 2008-01-02 | 2008-01-02
123 | 2008-01-03 |
123 | 2008-01-04 |
123 | 2008-01-05 | 2008-02-05
123 | 2008-01-06 |
...
...
126 | 2008-01-01 |
126 | 2008-01-02 |
126 | 2008-01-03 |
126 | 2008-01-04 |
126 | 2008-01-04 | 2008-01-05

It seems like a very wasteful table. If I HAD to build something like this, I'd do the following.
1 - Put a new field on the dates table called KEY and put '1' in each record.
2 - Get a list of distinct customer IDs, append to this a field named Key with 1 in it.
3 - Merge these 2 files together by KEY.
4 - Merge the result with your list of incidents by customer_ID and date, flag those records that match accordingly.

However, I think that there is almost certainly a better way to achieve your aim if you can tell us what it is. I've seen things like this done before, and there's almost always a better route to the solution.

Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
Thanks Chris!

The only thing with your solution is that a '1' after each record wouldn't create the table, no mather what kind of join I used (in= or double in=). But the solution was simple, I've created an "do date=Start_date to End_date by 1;" for every ID. After this, merging was no problem. But, like you've said, the table has millions of records. For now, I think it isn't a real problem, besides you can see a newbee has written the code. ;-)

I wanted to find a decent solution for this, but since my only programming experience is MSX/GW-basic (and Access / Excel and a basic SAS course), I have to find the logic of SAS (which I find very hard right now). The 'real' table has another variable, the ammount of arrears in months. My final goal is to create for each reporting date an overview of the arrears, like:

PERIOD (in days) | # of Arrears between 0 and 1 month | # of Arrears between 1 and 2 months | # of Arrears between 2 and 3 months | etc.

Between the incident dates as stated in my original post, the arrear can be between 0 and infinite. On the row of an incident date, the amount of arrear is placed. There's only an incident date when something changes, like an increase of the arrear or a removal. Arrears can be originating every day of the month, and can exist of more decimals. Since there's not an update every day (sometimes a month in between), the blanc data of arrears between the incident dates has to be the latest data known. Right now, I've filled the missing records between the incident dates in the huge table (with the last known data), so it shouldn't be too hard to get my overview (just counting with an if statement for the different types of arrear).

But... the nicest solution would be to have a code which scans all id's without creating any table (besides the table with the actual arrear dates). Maybe you've got a general idea to get this done on the 'right' way? Thanks again!
 
So to confirm, your customer data table has customer ID, Incident date, and arrears in months yes? Or is the other dates table the one containing the arrears data?
And you want to be able to show how many customers are in each "arrears band" within a given time window (your "Period" column in your report)?
Would "Period" be say a calendar time period such as "January", "February" etc, or rolling time windows "Last 7 days","Seven days before that" etc etc.
How would you handle a customer with multiple events (and possibly different arrears amounts) within 1 time period?
It's tantalisingly similar to a job I helped some people out with a couple of years ago, they ended up blowing out all the disk space on their system trying to do the same thign that you're doing, but we eventually came up with a much sleeker solution.

Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
Hey Chris,

No, right now I've got 1 table with reporting dates (start and end dates can be set manually by me), the incident dates (the date a change in arrear occur) and the actual arrear. I want to report on day-bases (so I can get a clear graphical overview for every month), like:

Period in Arrears in months
days (DMY) [0-1] [1-2] [2-3] [3+]
01-01-08 | 80 | 40 | 20 | 5 |
02-01-08 | 85 | 30 | 19 | 3 |
03-01-08 | 95 | 45 | 22 | 6 |
04-01-08 | 76 | 43 | 17 | 4 |

Because I want to report on day bases, there's just 1 incident date per customer (id) possible.

Again, it's possible with my huge table, but preferrable...? The best way is to have some set of functions who can create the table above, but I don't have any clue how...
 
OK, well, I think it's doable. The process I'd follow is, as usual, to build it up in stages.
I'm a bit rushed off my feet at the moment getting some things sorted for a meeting this afternoon, but I might have a go over the weekend.
From the sounds of it, what you need is the last incident date (change in arrear) before each report date, then I guess add the number of days between the incident date and the arrear to the arrear.
I think that programatically you've probably hit the simplest method of producing your report.

Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
Yes, well something like this:

Step 1 (no idea how to do without a huge table!): Determine the arrear for every customer_id on every date, where the arrear on a specific date is the last known arrear (on incident date). So if the arrear on the 1st of february was 1 month, it still should be 1 month on the 5th if there isn't an incident date in between.

Step 2 (not that difficult): attach a label on every kind of arrear (0.5 months = Bucket [0 - 1]).

Step 3 (also do-able for me): count the times a specific Bucket on a specific date occurs and create the table as shown.

If you can provide me with some knowledge of doing step 1 without just creating monster-tables, that would be great!!
 
Well, for this moment I'll stick to the huge mother I think... ;-)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top