Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Iterations applied to Updated new Data set

Status
Not open for further replies.

vassilisef

Technical User
Dec 14, 2008
14
GR
Hey,

I am having a SAS data set of the form:
Name P VO
ABC 4.80 1000
ABC 4.80 .
ABC 4.80 .
EFG 6.40 2000
EFG 6.40 .
EFG 6.40 .

but of course, with hundreds of names and thousands of observations (P & VO) for each name value.

My objective is to "clean" the tail of each name group for which the VO is a missing value. Namely, using just one Data step i would like to end up with an new set :

ABC 4.80 1000
EFG 6.40 2000


Naturally, if i use the following code:

Code:
data work.inputobserv;
infile datalines dlm=',';
input  name:$3. P:4.2 VO:4.;
datalines;
ABC,4.80,1000
ABC,4.80,.
ABC,4.80,.
EFG,6.40,2000
EFG,6.40,.
EFG,6.40,.
;
run;

data work.cleantail;
set inputobserv;
by name;
	do;
		if (last.name and VO=.) then delete;
	end;
run;

I will only get the Very last observation from each namegroup curtailed. Is there a way to make the 2nd iteration of the loop to apply to the dataset resulting from the 1st iteration (i.e. to a new dataset resulting after the 1st iteration has already deleted the last obs per name) ??

Thanks in advance,
Vassilis
 
I'm not sure I fully understand your question, but in my opinion - you really don't need a loop at all. A SAS datastep naturally loops through every record. And you are going to get funky results using a by statement if it isn't already sorted first.


proc sort data=inputobserv;
by name;
where VO ^=.;
run;

That sort would get rid of of the records with the missing values and assuming you have one entry like you have shown above - you would have your desired result.

Or you could output good records to a cleantail and another data set called missing.

data cleantail missing;
set inputobserv;
if VO=. then output cleantail;
else output missing;
run;

Not 100% sure what your objective is, but if I haven't answered your question - please post some more details and I'm sure I can help you.

 
Hey mdieckman,

First of all, i would like to thank you for your immediate response.

You are right, my query is not well explained by me:

My objective is delete the obs that exhibit a missing Value on the VO variable from the tail of each Name-Group ONLY, and not the obs that exhibit a missing Value on the VO variable that lie WITHIN each Name-Group.

Namely, if my initial SAS set is :

Name   P VO
ABC  3.70 900
ABC  3.60 850
ABC  3.10 .
ABC  4.80 1000
ABC  4.80 .
ABC    4.80 .
EFG 6.80 4000
EFG 6.40 2000
EFG  6.40 .
EFG 6.40 .


After the "Tail-curtailing", my desired output set will be:

Name   P VO
ABC  3.70 900
ABC  3.60 850
ABC  3.10 .
ABC  4.80 1000
EFG 6.80 4000
EFG 6.40 2000

As you can see, i managed to get rid of the missing VO tails without loosing the 3rd obs for ABC which remains valuable info for myself.

That is why, initially, i suggested a loop that will keep reading (and deleting) the Last.obs from each Name-Group until it "hits" a Non VO-missing obs in order for the Loop to stop.

Nevertheless, in order for this to be effective, each Loop iteration must be sequentially applied to the "updated" Name-group resulting from the previous iteration.

Most likely, there is another way of programming approach for my problem, but this is beyond my knowledge, since i am not an experienced user.

Thanks again,
Vassilis
 
My solution to this would be to reverse the sort order of the dataset, then delete each record where VO is missing, until such time as you encounter a value, then set a flag to indicate this. Use the RETAIN statement to keep this value across observations.
Something like this...
Code:
data work.inputobserv;
  infile datalines dlm=',';
  input  name:$3. P:4.2 VO:4.;
  recno = _n_;
datalines;
ABC,4.80,1000
ABC,4.80,.
ABC,4.80,.
EFG,6.40,.
EFG,6.40,2000
EFG,6.40,.
EFG,6.40,.
;
run;

proc sort data=inputobserv;
  by name descending recno;
run;

data work.cleantail;
  set inputobserv;
  by name;
  length flag $1;
  retain flag '';

  if first.name then flag='';
  if VO ne . then flag = 'Y';

  if flag = '' and VO = . then delete='Y';
run;

proc sort data=work.cleantail;
  by recno;
run;
I*'ve added in an extra example record to show you that earlier missing values are retained. Once you are happy that the code is removing the records you want removed, change the "delete='Y'" to simply "delete".

Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
Hi Vassilis,

This bit of code works by taking the number of the obs that is the first missing observation (_i_), so for the first group this would be 3, which would then be overwritten by 5, the _tail_ variable checks whether the last obs for that group is missing, if it is then we can conclude that the obs between _i_ to the last obs are tail end missing values. Consequently we can get rid of them.

data have ;
input Name $ P VO ;
cards ;
ABC 3.70 900
ABC 3.60 850
ABC 3.10 .
ABC 4.80 1000
ABC 4.80 .
ABC 4.80 .
BBB 8.55 1222
BBB 6.44 .
BBB 6.77 3444
EFG 6.80 4000
EFG 6.40 2000
EFG 6.40 .
EFG 6.40 .
;
data want (drop=_:);
do _n_=1 by 1 until (last.name) ;
set have ;
by name vo notsorted ;
if (first.vo and missing(vo)) then _i_ = _n_ ;
if (last.name and missing(vo)) then _tail_ = 1 ;
end ;
do _n_=1 to _n_ ;
set have ;
if ^(_n_ >= _i_ and _tail_) then output ;
end ;
run ;
 
Thank you both guys,

This puts a lot of studing & testing for me. I will use both suggestions and will let you know.

I am sincerely indebted,

Vassilis
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top