Hi,
I want to count the occurrence of a different values in the second field of a file.
Here's an example of the file'Vch_Head_NotSorted' ;
022276896167., 0000100.00,20050807
009377980266., 0000200.00,20050807
085142882117., 0000300.00,20050807
049156298766., 0001000.00,20050807
041342839181., 0001000.00,20050807
082753305176., 0001000.00,20050807
052027453477., 0000900.00,20050807
039021575680., 0000500.00,20050807
032238864563., 0000500.00,20050807
020162010606., 0000500.00,20050807
I've used the following unix command that sorts the file and counts the duplicates.
sort -k 2,2 Vch_Head_NotSorted | uniq -f 1 -c > Vch_Count_output.txt
This technique works fine for small files, however my file has 2,500,000 rows, and the output becomes fragmented...e.g.
4 169254214268., 0000001.00,20051129
11 024731477144., 0000002.00,20050807
4 167721924291., 0000002.00,20051129
70536 000008882400., 0000050.00,20010801
1 960128879889., 0000050.00,20020804
100 008308765031., 0000050.00,20021228
144 006136599792., 0000050.00,20040816
143 006452636359., 0000050.00,20040823
52 002133530269., 0000050.00,20040827
86 003102237868., 0000050.00,20040829
101 003655825721., 0000050.00,20040903
46 003931361433., 0000050.00,20040913
instead of ..
4 169254214268., 0000001.00,20051129
15 024731477144., 0000002.00,20050807
71209 000008882400., 0000050.00,20010801
I am not concerned with the first and last (3rd) fields, just the occurrence (count) of the different values in the second field and that value of the count.
Is there an awk based script that could produce this?
Many thanks in advance ;?)
I want to count the occurrence of a different values in the second field of a file.
Here's an example of the file'Vch_Head_NotSorted' ;
022276896167., 0000100.00,20050807
009377980266., 0000200.00,20050807
085142882117., 0000300.00,20050807
049156298766., 0001000.00,20050807
041342839181., 0001000.00,20050807
082753305176., 0001000.00,20050807
052027453477., 0000900.00,20050807
039021575680., 0000500.00,20050807
032238864563., 0000500.00,20050807
020162010606., 0000500.00,20050807
I've used the following unix command that sorts the file and counts the duplicates.
sort -k 2,2 Vch_Head_NotSorted | uniq -f 1 -c > Vch_Count_output.txt
This technique works fine for small files, however my file has 2,500,000 rows, and the output becomes fragmented...e.g.
4 169254214268., 0000001.00,20051129
11 024731477144., 0000002.00,20050807
4 167721924291., 0000002.00,20051129
70536 000008882400., 0000050.00,20010801
1 960128879889., 0000050.00,20020804
100 008308765031., 0000050.00,20021228
144 006136599792., 0000050.00,20040816
143 006452636359., 0000050.00,20040823
52 002133530269., 0000050.00,20040827
86 003102237868., 0000050.00,20040829
101 003655825721., 0000050.00,20040903
46 003931361433., 0000050.00,20040913
instead of ..
4 169254214268., 0000001.00,20051129
15 024731477144., 0000002.00,20050807
71209 000008882400., 0000050.00,20010801
I am not concerned with the first and last (3rd) fields, just the occurrence (count) of the different values in the second field and that value of the count.
Is there an awk based script that could produce this?
Many thanks in advance ;?)