Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Using Tag in loop and in (drop=tag)

Status
Not open for further replies.

RinaGreen

Technical User
Mar 8, 2005
31
US
Hi

Being absolutely new in SAS I got confused with the TAG in the following construction. I assume that this is a conditional loop. However I wonder what is the role of TAG? I tried to find it in help-on-line …but failed. Could you please explain me? Thank you in advance

Rina

data data1 data2 data3;
set mydata;
if (condition) then do;
tag=1;
output data1;
end;
else if (condition2) then do;
tag=2;
output data2;
end;
else if (condition3) then do;
tag=3;
output data3;
end;
run;
/* after that there are sorting of each data1,2,3 by id*/

data data1_3;
set data1 data2 data3 ;
run;


proc sort data=data1_3;
by id tag;

data final (drop=tag);
set data1_3;
by id tag;
if first.id;
run;

 
Rina,
The Tag var that you found in the code is just that, a tag. It seems that the programmer that used it wanted to 'tag' the condition and sort the results. So if the record conformed to condition 1 he/she used a var named tag=1 to let them know later.

The proc sort is missing the ending RUN statement.
(Prob not your fault, just a pet peeve that I have I need to point to some faults.)

In the final data step, the programmer 'drops' the tag var so that the dataset doesn't have that var in it.

Now I do not understand why the programmer needed to go to all that trouble. They could have just used the tag var and not split up the data (data1-3) and resorted afterward, but there are a million ways to skin a cat.
Klaz
 
This is a dedupe, which the coder has set a priority order on. The idea of this is to dedupe the records by ID, but give preference to records where the first condition (tag=1) is true, then condition 2 (tag=2) etc.

I've done similar things with multiple files where I want to dedupe between them so a person only appears on 1 list, but certain lists are given priority for one reason or another.

As an interesting little tweak. In this step :-
Code:
data data1_3;
 set data1 data2 data3 ;
run;
if you changed it to
Code:
data data1_3;
 set data1 data2 data3 ;
  by id tag;
run;
the subsequent sort procedure wouldn't be needed (so long as the three datasets are sorted by TAG beforehand) as the 3 datasets would be interleaved in that sort order.

 
Thank you very much. This is really about deduping!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top