Dear all,
Matching fields with random selection
I like to do the following matching and would be very grateful if someone would be able to provide a script file for doing this. Since the input dataset is quite large in order of millions of lines, it is almost impossible to upload into a statistical software. For this reason, I like to perform this task in order to reduce the size of the dataset to better manage and analyse.
I have one file which has 5 variables. Note that status variable (field 5) is either 1 or 0.
I refer field5 as a "case" if it equals 1 and a "control" if it equals 0.
Input file has the following format. I create a simple example for illustration
Input
code days ageday sex status
a 4 16 1 1
b 3 15 1 1
c 4 15 2 1
d 5 18 1 0
e 6 17 2 0
f 3 15 2 0
g 6 19 2 1
For each case, I need to find a control that is matched with that case by ageday and sex (field 3 and field 4)
The number of controls varies according to each matched case.
Desired Output
fset code days ageday sex status index
1 a 4 16 1 1 1
1 d 3 16 1 0 14
1 a 3 15 1 0 2
2 b 3 15 1 1 5
2 d 2 15 1 0 15
3 c 4 15 2 1 8
3 f 3 15 2 0 23
3 g 2 15 2 0 30
In the above output, I randomly select 2 controls for each case for illustration purpose. Note that in the status variable, the case comes first followed by 2 controls, then another case followed by one control and finally another case followed by 2 controls. So in the example, 3 "fset" are formed. The fset indicates the number of sets are formed with matched agedays and sex.
Index column referes to which unit of control is selected when random controls are taken. I label all the lines ( units) from 1..30 to keep track of selected controls. For example control 14 (in code b) happens to be chosen for a case in code a.
In addition, I need a summary after matching. In this example, I have
-------------------------------------------------------
1 case-control sets is incomplete (only one control)
1 case could not be matched (no control found)
------------------------------------------------------
Thank you very much for your help. Please do not hesitate to clarify with me if you do not quite follow.
Cheers,
T
Matching fields with random selection
I like to do the following matching and would be very grateful if someone would be able to provide a script file for doing this. Since the input dataset is quite large in order of millions of lines, it is almost impossible to upload into a statistical software. For this reason, I like to perform this task in order to reduce the size of the dataset to better manage and analyse.
I have one file which has 5 variables. Note that status variable (field 5) is either 1 or 0.
I refer field5 as a "case" if it equals 1 and a "control" if it equals 0.
Input file has the following format. I create a simple example for illustration
Input
code days ageday sex status
a 4 16 1 1
b 3 15 1 1
c 4 15 2 1
d 5 18 1 0
e 6 17 2 0
f 3 15 2 0
g 6 19 2 1
For each case, I need to find a control that is matched with that case by ageday and sex (field 3 and field 4)
The number of controls varies according to each matched case.
Desired Output
fset code days ageday sex status index
1 a 4 16 1 1 1
1 d 3 16 1 0 14
1 a 3 15 1 0 2
2 b 3 15 1 1 5
2 d 2 15 1 0 15
3 c 4 15 2 1 8
3 f 3 15 2 0 23
3 g 2 15 2 0 30
In the above output, I randomly select 2 controls for each case for illustration purpose. Note that in the status variable, the case comes first followed by 2 controls, then another case followed by one control and finally another case followed by 2 controls. So in the example, 3 "fset" are formed. The fset indicates the number of sets are formed with matched agedays and sex.
Index column referes to which unit of control is selected when random controls are taken. I label all the lines ( units) from 1..30 to keep track of selected controls. For example control 14 (in code b) happens to be chosen for a case in code a.
In addition, I need a summary after matching. In this example, I have
-------------------------------------------------------
1 case-control sets is incomplete (only one control)
1 case could not be matched (no control found)
------------------------------------------------------
Thank you very much for your help. Please do not hesitate to clarify with me if you do not quite follow.
Cheers,
T