Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Can I Use SAS Arrays To Do This? 1

Status
Not open for further replies.

bruce282

Programmer
Jan 26, 2007
8
US
I have a file of records that contains a 7 position id, an 8 position YYYYMMDD start date, a state code, and an YYYYMMDD end date. There can be from 1 to 30 records with the same id. I'm trying to remove overlapping dates from the file. I need to read in all records with the same id and be able to walk the array from top (oldest date) to bottom (newest).

I have read the various SAS documents we have at work, and all of them say the same thing, SAS arrays are not the same as arrays used in other languages. My boss wants me use SAS, I know I could do it in COBOL on our mainframe, but he's trying to prove the worth of PC/SAS and I'm up for new things.

So can I use arrays to do it, or am I approaching this all wrong?

Thanks,

Bruce
 
Well, I don't know how arrays work in other languages, but they are reasonably straightforward in SAS and it sounds like it might well be what you need. I don't think you can sort an array in SAS (I'll be honest, I use them very infrequently), however you can of course sort the data before loading it into the array. Once data is in the array you can loop through the array using a do loop easily enough.
Is that enough for you to go on with or do you want some code examples?
If you're happy to plug away at it, the details you'll want to look up in the doco are
ARRAY statement
DO loop (Do statement [do i = 1 to 20])
That should cover the bulk of what you're trying to do I reckon.


Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
Chris,

Thanks for information. I've used SAS on the mainframe for 4 years, but for simple "quick and dirties". I'm a COBOL and VB programmer by trade.

In reading all I could about SAS arrays from forums, the help files and 2 SAS programming books I admit to not knowing how SAS gets the data into an array. Between the books and online sources I've been pointed in the direction of using the control number as the by variable when inputting the data, using first.control & last.control, using both at the same time, and using neither.

So what I need to do is read the input file which has been sorted in control number, start date, state order. If there is only 1 record for this control number, write it to the output, else put it an array and get the next record. When all the records (currently up to a max of 31 per control number) for a control number have been read in, process them, this could include merging two back to back records into 1, making 3 records out of 2 based on a change in state code. I'm hoping to be able to massage the data in the array, but first I have to get the data into the array, that's what I'm hung up right now.

Sorry this is so long, but I thought more information might be better in this case.


Bruce
 
Chris,

I forgot to mention I planned on using a 2 dimensional array, 40 rows (1 per record) by 4 columns for the 4 fields in each input record.

Bruce
 
I don't think this type of need require arrays. You first have to know your data. What is the definition for overlapping dates. I have had that definition vary. Once you define what a record with overlapping dates is you can start to do this with very little work in SAS.

Another question you should ask yourself is if you have two records that start or end on the same data point (date) how do you set the sequence. Do you care about retaining records that have duplicates. I mean, I know that you want to remove duplicate records, but do you want to remove the all but one and you don't care which one. Or, do you want to sort the records and take the latest or the first in that series. After you decide on your business algorithm you can use the PROC SORT and some data step procedures.

ex.
Code:
proc sort
  data = your_input_data
  out  = your_data;
  by pos_ID startdate enddate state;
run;
Now if your business algorithm detirmins that the start date is what an overlapping record is you can take the first record with the same date.
like this:
Code:
data your_NON_overlap;
  set your_data;
  by pos_id startdate;
  if first.startdate;
run;
You should have all your records in this dataset. Of course, you can do a hell of a lot more processing if you needed to. Arrays are used on the record level not the dataset level. You could use it at the dataset level but why would you? It wasn't designed for that.

 
If you're uncertain about loading data into an array, and then using it, take a look at this code. It's a cut down version of some code I used to examine people buying into raffle draws. It shows the array being created, initialised, filled, read and used and then written out again.
Code:
data current_status2(drop=i j x previous next current counter);
  set current_Status1;
  by ID DRAW;

  retain counter ;

  array aDRAW{12}      4  _TEMPORARY_;
  array STATS{12}     $5  _TEMPORARY_;
  array MOVE{12}     $10  _TEMPORARY_;

  if first.ID then
  do;
    * Reset everything for first of group *;
    counter = 0;
    do x = 1 to 12;
      AUS{x}   = .;
      STATS{x} = '';
      MOVE{x}  = '';
    end;
  end;

  * This will give us the number of records for the group *;
  counter + 1;

  * Load values into array *;
  aDRAW{counter} = draw;
  STATS{counter} = new_status;


  *Once all records for group are loaded, start checkign values in array *;
  if last.buyerID then
  do;
    do i = 1 to counter;
      previous = i - 1;
      next = i + 1;
      current = i;


    IF STATS{next} ne '1'
         OR aDRAW{next} ne aDRAW{current} + 1 then 
           MOVE{CURRENT} = '2';
    ELSE IF STATS{next} = '1' and DRAW{next} = DRAW{current} + 1 then 
           MOVE{current} = '1';
    end;

    * Output each item in the array again *;
    do j = 1 to counter;
      draw = aDRAW{j};
      NEW_STATUS = STATS{j};
      MOVEMENT = MOVE{j};
      output;
    end;
  end;
run;

I hope that this helps, it sounds like this is pretty much what you want to do.


Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top