This is what I am trying to do using AWK language. I have problem with mainly step 2. I have shown a sample dataset but the original dataset consists of 100 fields and 2000 records.
Algorithm
1) initialize accuracy = 0
[pre]2) for each record r
--Find the closest other record, o, in the dataset using distance formula[/pre]
3) if the class value of closest record o is equal to class value of current record, increment accuracy by 1. here, class value is last field(col 6)
4) Finally, 100 * accuracy/total_records
Sample Dataset
[pre] c1 c2 c3 c4 c5 c6 --> Columns
0.6 0.1 0.2 0.3 0.4 0.3 --> row1 & row7 nearest neighbor in c1
0.1 0.2 0.1 0.1 0.1 0.6 and same values in c6(0.3) so ++accuracy
0.2 0.3 0.1 0.1 0.2 0.6
0.3 0.4 0.1 0.1 0.3 0.3
0.4 0.5 0.1 0.1 0.9 0.6
0.5 0.6 0.1 0.1 0.8 0.9
0.6 0.7 0.1 0.1 0.7 0.3
0.7 0.8 0.1 0.1 0.6 0.6
0.8 0.9 0.1 0.1 0.5 0.9
0.9 1.0 0.1 0.1 0.4 0.3
[/pre]Code
[pre]BEGIN{
accuracy = 0;
total_records = 10;
}
{
for(i = 1; i <= 5; i++ ) # for fields 1 to 5 only
#for each record
{
#find closest record(calculating the distance)
distance = abs($i - other_records)
#compare values of field 6 for closest and current(each) record
if(current_record_field_6.value == closest_record_field_6.value)
{
++accuracy;
}
}
}
END{
percentage = 100 * (accuracy/total_records);
print percentage;
}
[/pre]I am struggling on how to find the closest record for each record in the dataset using AWK. As far as I know '{}' block is only executed once for each record.
Any help or suggestion is much appreciated.
Algorithm
1) initialize accuracy = 0
[pre]2) for each record r
--Find the closest other record, o, in the dataset using distance formula[/pre]
3) if the class value of closest record o is equal to class value of current record, increment accuracy by 1. here, class value is last field(col 6)
4) Finally, 100 * accuracy/total_records
Sample Dataset
[pre] c1 c2 c3 c4 c5 c6 --> Columns
0.6 0.1 0.2 0.3 0.4 0.3 --> row1 & row7 nearest neighbor in c1
0.1 0.2 0.1 0.1 0.1 0.6 and same values in c6(0.3) so ++accuracy
0.2 0.3 0.1 0.1 0.2 0.6
0.3 0.4 0.1 0.1 0.3 0.3
0.4 0.5 0.1 0.1 0.9 0.6
0.5 0.6 0.1 0.1 0.8 0.9
0.6 0.7 0.1 0.1 0.7 0.3
0.7 0.8 0.1 0.1 0.6 0.6
0.8 0.9 0.1 0.1 0.5 0.9
0.9 1.0 0.1 0.1 0.4 0.3
[/pre]Code
[pre]BEGIN{
accuracy = 0;
total_records = 10;
}
{
for(i = 1; i <= 5; i++ ) # for fields 1 to 5 only
#for each record
{
#find closest record(calculating the distance)
distance = abs($i - other_records)
#compare values of field 6 for closest and current(each) record
if(current_record_field_6.value == closest_record_field_6.value)
{
++accuracy;
}
}
}
END{
percentage = 100 * (accuracy/total_records);
print percentage;
}
[/pre]I am struggling on how to find the closest record for each record in the dataset using AWK. As far as I know '{}' block is only executed once for each record.
Any help or suggestion is much appreciated.