Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Matching fields with random selection

Status
Not open for further replies.

tivona

Technical User
Oct 23, 2010
6
0
0
HK
Dear all,

Matching fields with random selection

I like to do the following matching and would be very grateful if someone would be able to provide a script file for doing this. Since the input dataset is quite large in order of millions of lines, it is almost impossible to upload into a statistical software. For this reason, I like to perform this task in order to reduce the size of the dataset to better manage and analyse.

I have one file which has 5 variables. Note that status variable (field 5) is either 1 or 0.

I refer field5 as a "case" if it equals 1 and a "control" if it equals 0.

Input file has the following format. I create a simple example for illustration

Input

code days ageday sex status
a 4 16 1 1
b 3 15 1 1
c 4 15 2 1
d 5 18 1 0
e 6 17 2 0
f 3 15 2 0
g 6 19 2 1


For each case, I need to find a control that is matched with that case by ageday and sex (field 3 and field 4)
The number of controls varies according to each matched case.

Desired Output

fset code days ageday sex status index
1 a 4 16 1 1 1
1 d 3 16 1 0 14
1 a 3 15 1 0 2
2 b 3 15 1 1 5
2 d 2 15 1 0 15
3 c 4 15 2 1 8
3 f 3 15 2 0 23
3 g 2 15 2 0 30



In the above output, I randomly select 2 controls for each case for illustration purpose. Note that in the status variable, the case comes first followed by 2 controls, then another case followed by one control and finally another case followed by 2 controls. So in the example, 3 "fset" are formed. The fset indicates the number of sets are formed with matched agedays and sex.

Index column referes to which unit of control is selected when random controls are taken. I label all the lines ( units) from 1..30 to keep track of selected controls. For example control 14 (in code b) happens to be chosen for a case in code a.


In addition, I need a summary after matching. In this example, I have

-------------------------------------------------------
1 case-control sets is incomplete (only one control)
1 case could not be matched (no control found)
------------------------------------------------------


Thank you very much for your help. Please do not hesitate to clarify with me if you do not quite follow.

Cheers,

T

 
Further to my problem posted earlier, I have not mentioned that a same control can serve to different case. That is, for example, after a control is chosen for a case for code a, that same control can then serve subsequently as a control to another case provided that both agedays and sex fields are matched.

Line 3 of the output

fset code days ageday sex status index
1 a 3 15 1 0 2

Why code a can be used for its own control?

Here for code a, at day 15 the subject was still alive (only on day 16, the subject died) so its status was 0 and it can be used as its own control.

Best,

T
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top