Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to use .first variable in SAS data step

Status
Not open for further replies.

wannabe2007

Technical User
Jul 22, 2009
5
CA
Scenario: Want to pull only the first record of a dataset by user ID (may be duplicates for any given user) by earliest date and record number ID (may be duplicates for any given user). This is what I’ve tried so far and it is not always pulling the first visit.

proc sort data=FirstVisit;
by UserID DATE RecordNo;
run;

data FirstVisit;
set FirstVisit;
/*maybe I should be using a retain statement here???)
by UserID DATE RecordNo;
/*If it was their first offer*/
if first.UserID then VisitType = 'Initial Visit';
else VisitType = 'FollowupVisit';
run;

proc sort data= FirstVisit;
by VisitType UserID;
run;

data Visits;
set Visits;
keep UserID VisitType DATE RecordNo;
if VisitType = ‘Initial Visit';
run;

Thanks for any suggestions re above data steps or alternative code you may be able to provide. Much appreciated.
 
You might try:

proc sort data=FirstVisit;
by UserID descending DATE RecordNo;
run;

or/and use NOPDUPLICATES or NODUPKEY on the sort.
 
What type of value is your date? It's possible that your Date is not a SAS date and therefore is not being sorted properly. You may need to use a put or input to get it into a SAS date and then sort it.

Also, if you say if first.UserID; It will keep that first record without creating that VisitType field.

Like so:

Code:
data FirstVisit;
   set FirstVisit;
 /*maybe I should be using a retain statement here??? no) */ 
   by UserID DATE RecordNo;
   if first.UserID;
run;
 
It's usually a good idea when doing these first. and last. jobs to look at the physical data to make sure your sort is doing what you're expecting it to do. I frequently do this just so I can see the data before I start coding, it makes it easier for me to plan out the code.

Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
I want to thnak all of you who took the time to reply to my post.

I still haven't been able to always pull the first record but still working on it.

Thanks again.
 
Can you give examples of your data with formats/informats of the fields? I'm sure we'll be able to figure out what is going on.
 
To follow is the code I'm using. It seems to be working fine with the exception of the .first variable data step. Thanks for any additional help.


/* Calculate the median wait times to initial LTC placement by type of LTCF and situation (i.e. community & hospital)*/

/* First pull wait offers which are actual placements */

proc format;
value MDS_LTCFACILITYTYPE
1='NH'
2='RCF'
3='CBO';
run;

data waitoffer;
set SEA_DM.MDS_WAITOFFER;
/* Keep if offer was accepted */

if MDS_WAITOFFERDECISION = 1;
/* Only interested in placements between for fiscal year in question */

WAITOFFERDECISIONDT = datepart(MDS_WAITOFFERDECISIONDT);

if WAITOFFERDECISIONDT >= mdy(4,1,2008) and WAITOFFERDECISIONDT < mdy(4,1,2009) then fy = '2008/09';
if fy NE ' ';
format WAITOFFERDECISIONDT date9.;
keep MDSRESID MDS_WAITREGISTRATIONNUM MDS_WAITOFFERNUM MDS_WAITOFFERDECISION WAITOFFERDECISIONDT
MDS_LTCFACILITYSEQNUM fy;
run;

proc sort data=waitoffer;
by MDSRESID WAITOFFERDECISIONDT MDS_WAITREGISTRATIONNUM;
run;

data waitoffer;
set waitoffer ;
by MDSRESID WAITOFFERDECISIONDT MDS_WAITREGISTRATIONNUM;
if first.MDSRESID then PLACEMENTTYPE = 'Initial Placement';
else PLACEMENTTYPE = 'Transfer';
run;
proc sort data=waitoffer nodupkey;
by MDSRESID;
run;

proc print data=waitoffer (obs=50);
run;

/*Determine placement situation (i.e. from Community, from Hopital or Transfer)*/

data waitregistration;
set SEA_DM.MDS_WAITREGISTRATION;
if datepart(MDS_CURRASSESSMENTDATE) = mdy(1,1,1900) then ASSESSMENTDATE = datepart(MDS_ASSESSMENTDATE);
else ASSESSMENTDATE = datepart(MDS_CURRASSESSMENTDATE);

ASSESSMENTDATE = datepart(MDS_ASSESSMENTDATE);
format ASSESSMENTDATE date9.;

/*transdate = datepart(MDS_WAITTRANSFERDATE);*/

if MDS_HOSPITALSEQUENCENUM = 0 then type = 'C';
if MDS_HOSPITALSEQUENCENUM NE 0 then type = 'H';
if MDS_WAITTRANSFERDATE > '1jan1900'd then type = 'T';

keep MDSRESID MDS_WAITREGISTRATIONNUM MDS_WAITTRANSFERDATE MDS_WAITLTCPRIORITY
ASSESSMENTDATE MDS_HOSPITALSEQUENCENUM type;
run;

proc sort data=waitoffer;
by MDSRESID MDS_WAITREGISTRATIONNUM;
run;

proc sort data=waitregistration;
by MDSRESID MDS_WAITREGISTRATIONNUM type;
run;

data waitplacements;
merge waitoffer(in=a) waitregistration;
by MDSRESID MDS_WAITREGISTRATIONNUM;

if a;

/* Don't keep post-facility transfers*/
if datepart(MDS_WAITTRANSFERDATE) = mdy(1,1,1900);

/* Have to be regular priority - no AP or Level 2 */
if MDS_WAITLTCPRIORITY = 3;


run;

/* Now merge the proper district onto the data */

data ltcfacility;
set SEA_DM.MDS_LTCFACILITY;
keep MDS_LTCFACILITYSEQNUM MDS_COMMUNITYSEQNUM MDS_LTCFACILITY MDS_LTCFACILITYTYPE;
run;

proc sort data=waitplacements;
by MDS_LTCFACILITYSEQNUM;
run;

proc sort data=ltcfacility;
by MDS_LTCFACILITYSEQNUM;
run;

data waitplacements;
merge waitplacements(in=a) ltcfacility;
by MDS_LTCFACILITYSEQNUM;
if a;
run;

data community;
set SEA_DM.MDS_COMMUNITY;
keep MDS_COMMUNITYSEQNUM MDS_DISTRICT;
run;

proc sort data=waitplacements;
by MDS_COMMUNITYSEQNUM;
run;

proc sort data=community;
by MDS_COMMUNITYSEQNUM;
run;

data waitplacements;
merge waitplacements(in=a) community;
by MDS_COMMUNITYSEQNUM;
if a;
run;

/* Now add LTC facility communities */

data ltccommunity;
set SEA_DM.MDS_COMMUNITY;
keep MDS_COMMUNITYSEQNUM MDS_COMMUNITY;
run;

proc sort data=ltccommunity;
by MDS_COMMUNITYSEQNUM;
run;

proc sort data=waitplacements;
by MDS_COMMUNITYSEQNUM;
run;

data waitplacements;
merge waitplacements(in=a) ltccommunity;
by MDS_COMMUNITYSEQNUM;
if a;
days = WAITOFFERDECISIONDT - ASSESSMENTDATE;
run;

proc freq data=waitplacements;
tables days;
run;

/* Data quality check */

data waitplacements;
set waitplacements;

/* Remove missing values for days (large integer) and negative wait times */
if days < 10000 and days >= 0;
run;

proc freq data=waitplacements;
tables days;
run;

/* Make final dataset*/
proc sort data=waitplacements;
by PLACEMENTTYPE MDS_LTCFACILITYTYPE MDS_DISTRICT MDS_LTCFACILITY;
run;

data waitplacements;
set waitplacements;
/*format MDS_LTCFACILITYTYPE;*/
keep MDSRESID PLACEMENTTYPE MDS_LTCFACILITYTYPE MDS_DISTRICT MDS_LTCFACILITY ASSESSMENTDATE WAITOFFERDECISIONDT MDS_WAITREGISTRATIONNUM MDS_COMMUNITY type days fy;
if PLACEMENTTYPE = 'Initial Placement';
run;

proc freq data=waitplacements;
run;

proc contents data=waitplacements;
run;

proc sort data=waitplacements;
by MDSRESID MDS_DISTRICT MDS_LTCFACILITYTYPE PLACEMENTTYPE type WAITOFFERDECISIONDT MDS_WAITREGISTRATIONNUM;
run;

proc means data=waitplacements n mean median std var;
var days;
by MDSRESID MDS_DISTRICT MDS_LTCFACILITYTYPE PLACEMENTTYPE WAITOFFERDECISIONDT MDS_WAITREGISTRATIONNUM/*type*/;
output out = waitmeans mean(days) = meandays median(days) = mediandays N(days) = placements;
run;

proc freq data=waitplacements;
tables PLACEMENTTYPE*MDS_DISTRICT*MDS_LTCFACILITYTYPE MDS_WAITREGISTRATIONNUM/*type*//nocol norow nopercent;
run;

proc print data=waitmeans;
run;
 
I'm guessing that this is the section where you aren't getting the results you expect?
Code:
proc sort data=waitoffer;
   by MDSRESID WAITOFFERDECISIONDT MDS_WAITREGISTRATIONNUM;
run;

data waitoffer;
  set waitoffer ;
  by MDSRESID WAITOFFERDECISIONDT MDS_WAITREGISTRATIONNUM;

  if first.MDSRESID then PLACEMENTTYPE = 'Initial Placement';
  else PLACEMENTTYPE = 'Transfer';
run; 

proc sort data=waitoffer nodupkey;
   by MDSRESID;
run;

proc print data=waitoffer (obs=50);
run;

First up, writing results out to the same dataset you read in, makes it a bit more difficult to debug, I'd recommend avoiding it where possible.

Secondly, I NEVER use nodupkey on a proc sort to dedupe when I specifically want certain records out. In theory it should work, but I like to be absolutely sure. Besides which, you can in this case very easily skip that step completely (which is more efficient as you don't need to process the datset a second time).
Try this instead,
Code:
proc sort data=waitoffer;
   by MDSRESID WAITOFFERDECISIONDT MDS_WAITREGISTRATIONNUM;
run;

data waitoffer2 transfers;
  set waitoffer ;
  by MDSRESID WAITOFFERDECISIONDT MDS_WAITREGISTRATIONNUM;

  if first.MDSRESID then
  do;
    PLACEMENTTYPE = 'Initial Placement';
    output waitoffer;
  end;
  else
  do;
    PLACEMENTTYPE = 'Transfer';
    output transfers;
  end;
run; 

proc print data=waitoffer2 (obs=50);
run;

proc print data=transfers (obs=50);
run;

If this doesn't fix your problem then you'll need to explain what the exact issue is. For instance, if it's notbringing back the first record all the time what record IS it bringing back, and at which point are you looking at the dataset.

Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top