Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

count occurances and num of unique occurances 2

Status
Not open for further replies.
Feb 12, 2002
80
NO
Hi,

I have a large flat text file, comma delim. There are 27 columns in the file.

I want to do soemthing like this:
Code:
for each [field 1]
    count occurances in [field 7]
 and
    count UNIQUE occurances in [field 7]
So the output would be something like:

ABC, 20, 2
DEF, 12, 3

meaning that there are 20 occurances of anything in column 7 for ABC, but only 2 unique ones.

I hope that makes sense.

All help appreaciated, and the more flexible the code the better - I might want to mix the columns i.e. take a unique lisitng of field 11, but a count of field 7

thanks in advance!
littleIdiot

Example file for testing below - any sensitive info has been replaced with 'junk':
Code:
f01,f02,f03,f04,f05,f06,f07,f08,f09,f10,f11,f12,f13,f14,f15,f16,f17.f18,f19,f20,f21,f22,f23,f24,f25,f26,f27
test01,junk,junk,test01,junk,1,test01-01,,,line,test01-01,,test01-01,1,100,615,90.6043,,,53,junk,2005-12-10 13:28:42,junk,,,Line
test01,junk,junk,test01,junk,2,test01-02,,,line,test01-02,,test01-02,1,100,578,90.5721,,,49,junk,2005-12-10 13:28:42,junk,,,Line
test01,junk,junk,test01,junk,3,test01-03,,,line,test01-03,,test01-03,1,100,701,269.6372,,,62,junk,2005-12-10 13:28:42,junk,,,Line
test01,junk,junk,test01,junk,4,test01-04,,,line,test01-04,,test01-04,1,100,540,315.578,,,45,junk,2005-12-10 13:28:42,junk,,,Line
test01,junk,junk,test01,junk,5,test01-05,,,line,test01-05,,test01-05,1,100,674,139.0329,,,59,junk,2005-12-10 13:28:42,junk,,,Line
test01,junk,junk,test01,junk,6,test01-06,,,line,test01-06,,test01-06,1,100,543,315.2247,,,46,junk,2005-12-10 13:28:42,junk,,,Line
test01,junk,junk,test01,junk,7,test01-07,,,line,test01-07,,test01-07,1,100,625,360.3229,,,54,junk,2005-12-10 13:28:42,junk,,,Line
test01,junk,junk,test01,junk,8,test01-08,,,line,test01-08,,test01-08,1,100,579,180.3166,,,49,junk,2005-12-10 13:28:42,junk,,,Line
test01,junk,junk,test01,junk,9,test01-09,,,line,test01-09,,test01-09,1,100,621,0.5512,,,54,junk,2005-12-10 13:28:42,junk,,,Line
test01,junk,junk,test01,junk,10,test01-10,,,line,test01-10,,test01-10,1,100,510,226.1236,,,42,junk,2005-12-10 13:28:42,junk,,,Line
test01,junk,junk,test01,junk,11,test01-11,,,line,test01-11,,test01-11,1,100,725,48.5808,,,64,junk,2005-12-10 13:28:42,junk,,,Line
test01,junk,junk,test01,junk,12,test01-12,,,line,test01-12,,test01-12,1,100,543,224.9057,,,46,junk,2005-12-10 13:28:42,junk,,,Line
test02,junk,junk,test02,junk,1,test02-01,,,line,test02-01,,test02-01,1,100,613,90.837,,,53,junk,2005-12-10 13:30:25,junk,,,Line
test02,junk,junk,test02,junk,2,test02-02,,,line,test02-02,,test02-02,1,100,573,270.8786,,,49,junk,2005-12-10 13:30:25,junk,,,Line
test02,junk,junk,test02,junk,3,test02-03,,,line,test02-03,,test02-03,1,100,615,270.5644,,,53,junk,2005-12-10 13:30:25,junk,,,Line
test02,junk,junk,test02,junk,4,test02-04,,,line,test02-04,,test02-04,1,100,540,135.7096,,,45,junk,2005-12-10 13:30:25,junk,,,Line
test02,junk,junk,test02,junk,5,test02-05,,,line,test02-05,,test02-05,1,100,579,315.9338,,,49,junk,2005-12-10 13:30:25,junk,,,Line
test02,junk,junk,test02,junk,6,test02-06,,,line,test02-06,,test02-06,1,100,332,315.658,,,25,junk,2005-12-10 13:30:26,junk,,,Line
test02,junk,junk,test02,junk,7,test02-07,,,line,test02-07,,test02-07,1,100,473,180.7433,,,39,junk,2005-12-10 13:30:26,junk,,,Line
test02,junk,junk,test02,junk,8,test02-08,,,line,test02-08,,test02-08,1,100,451,0.8355,,,37,junk,2005-12-10 13:30:26,junk,,,Line
test02,junk,junk,test02,junk,9,test02-09,,,line,test02-09,,test02-09,1,100,473,180.7655,,,39,junk,2005-12-10 13:30:26,junk,,,Line
test02,junk,junk,test02,junk,10,test02-10,,,line,test02-10,,test02-10,1,100,572,226.4374,,,49,junk,2005-12-10 13:30:26,junk,,,Line
test02,junk,junk,test02,junk,11,test02-11,,,line,test02-11,,test02-11,1,100,465,46.4576,,,38,junk,2005-12-10 13:30:26,junk,,,Line
test02,junk,junk,test02,junk,12,test02-12,,,line,test02-12,,test02-12,1,100,546,44.9491,,,46,junk,2005-12-10 13:30:26,junk,,,Line
test03,junk,junk,test03,junk,2,test03-02,,,line,test03-02,,test03-02,1,1,1056,135.0001,,,107,junk,2005-12-10 13:33:01,junk,,,Line
test03,junk,junk,test03,junk,3,test03-03,,,line,test03-03,,test03-03,1,1,1363,315.2472,,,138,junk,2005-12-10 13:33:01,junk,,,Line
test03,junk,junk,test03,junk,4,test03-04,,,line,test03-04,,test03-04,1,1,1000,135.2342,,,101,junk,2005-12-10 13:33:01,junk,,,Line
test03,junk,junk,test03,junk,5,test03-05,,,line,test03-05,,test03-05,1,1,1298,315.108,,,131,junk,2005-12-10 13:33:01,junk,,,Line
test03,junk,junk,test03,junk,6,test03-06,,,line,test03-06,,test03-06,1,1,472,123.0805,,,49,junk,2005-12-10 13:33:01,junk,,,Line
test03,junk,junk,test03,junk,6,test03-06,,,line,test03-06(A),,test03-06(A),2,10346,11032,122.7696,,,71,junk,2005-12-10 13:33:01,junk,,,Line
test03,junk,junk,test03,junk,7,test03-07,,,line,test03-07,,test03-07,1,1,915,302.7532,,,93,junk,2005-12-10 13:33:01,junk,,,Line
test03,junk,junk,test03,junk,8,test03-08,,,line,test03-08,,test03-08,1,1,949,121.931,,,96,junk,2005-12-10 13:33:01,junk,,,Line
test03,junk,junk,test03,junk,9,test03-09,,,line,test03-09,,test03-09,1,1,888,45.7871,,,90,junk,2005-12-10 13:33:01,junk,,,Line
test03,junk,junk,test03,junk,10,test03-10,,,line,test03-10,,test03-10,1,1,1267,46.3325,,,128,junk,2005-12-10 13:33:02,junk,,,Line
test03,junk,junk,test03,junk,11,test03-11,,,line,test03-11,,test03-11,1,1,793,226.2124,,,81,junk,2005-12-10 13:33:02,junk,,,Line
test03,junk,junk,test03,junk,12,test03-12(A),,,line,test03-12(A),,test03-12(A),1,10001,11419,46.051,,,143,junk,2005-12-10 13:33:02,junk,,,Line
test03,junk,junk,test03,junk,13,test03-13,,,line,test03-13,,test03-13,1,1,997,46.358,,,101,junk,2005-12-10 13:33:02,junk,,,Line
test03,junk,junk,test03,junk,14,test03-14,,,line,test03-14,,test03-14,1,1,1417,226.2161,,,143,junk,2005-12-10 13:33:02,junk,,,Line
test03,junk,junk,test03,junk,15,test03-15,,,line,test03-15,,test03-15,1,1,1064,226.4525,,,108,junk,2005-12-10 13:33:02,junk,,,Line
test03,junk,junk,test03,junk,16,test03-16,,,line,test03-16,,test03-16,1,1,1117,46.2635,,,113,junk,2005-12-10 13:33:02,junk,,,Line
test03,junk,junk,test03,junk,17,test03-17,,,line,test03-17,,test03-17,1,1,1068,226.2436,,,108,junk,2005-12-10 13:33:02,junk,,,Line
test03,junk,junk,test03,junk,18,test03-18,,,line,test03-18,,test03-18,1,1,793,46.0021,,,81,junk,2005-12-10 13:33:02,junk,,,Line
test03,junk,junk,test03,junk,19,test03-19,,,line,test03-19,,test03-19,1,1,822,46.3057,,,84,junk,2005-12-10 13:33:02,junk,,,Line
test03,junk,junk,test03,junk,20,test03-20,,,line,test03-20,,test03-20,1,1,730,226.4006,,,74,junk,2005-12-10 13:33:02,junk,,,Line
test09,junk,junk,test09,junk,1,test09-01,,,line,test09-01,,test09-01,1,101,817,203.1869,,,73,junk,2005-12-10 13:35:33,junk,,,Line
test09,junk,junk,test09,junk,2,test09-02,,,line,test09-02,,test09-02,1,101,527,22.8505,,,44,junk,2005-12-10 13:35:33,junk,,,Line
test09,junk,junk,test09,junk,3,test09-03,,,line,test09-03,,test09-03,1,101,547,23.1425,,,46,junk,2005-12-10 13:35:33,junk,,,Line
test09,junk,junk,test09,junk,4,test09-04,,,line,test09-04,,test09-04,1,101,554,202.7603,,,47,junk,2005-12-10 13:35:33,junk,,,Line
test09,junk,junk,test09,junk,4,test09-04,,,line,test09-04A,,test09-04A,2,488,679,202.6553,,,21,junk,2005-12-10 13:35:33,junk,,,Line
test09,junk,junk,test09,junk,5,test09-05,,,line,test09-05,,test09-05,1,101,847,22.7665,,,76,junk,2005-12-10 13:35:33,junk,,,Line
test09,junk,junk,test09,junk,6,test09-06,,,line,test09-06,,test09-06,1,101,883,22.8696,,,80,junk,2005-12-10 13:35:33,junk,,,Line
test09,junk,junk,test09,junk,7,test09-07,,,line,test09-07,,test09-07,1,101,848,22.7554,,,76,junk,2005-12-10 13:35:33,junk,,,Line
test09,junk,junk,test09,junk,8,test09-08,,,line,test09-08,,test09-08,1,101,783,202.8404,,,70,junk,2005-12-10 13:35:33,junk,,,Line
 
And what have you tried so far ?

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Here's a thought:

Code:
awk -F, '
        $1 != last1 && NR>1 { print last1,count7,uniq7 ; count7=uniq7=0}
        $7 != last7 { uniq7++ }
        {count7++; last1=$1 ; last7=$7}
        END { print last1,count7,uniq7 }
' inputfile

Annihilannic.
 
PHV - I tried a couple of different methods, based on awk snippets I have taken from different postings, and trying to use some awk one liners - but none came close to giving me the unique part I need. So in short, nothing I tried so far came close! :-(

Annihilannic - yet again you come to the rescue with a great answer. Not sure I understand it fully - I never seem to be able to deconstruct awk answers to anythig meaningful. I tried it out on a couple of example inputfiles and it seems to work just fine!

A BIG thank you - have you ever thought of adding up how much time you guys have saved people? If so, you can add maybe 8 hours to your tally!

Thank you. [thumbsup]
 
I'll add comments to try and make sense of it. Remember every clause is executed for every line of the input file, except for the END clause, which is naturally executed when input processing is complete.

This solution makes the assumption that non-unique values will be consecutive in the input data (which is the case with your sample data).

Code:
awk -F, '
        # if the first field is different to the first field
        # on the previous line (skipping the first record),
        # then print the first field and counters, and reset
        # the counters
        $1 != last1 && NR>1 { print last1,count7,uniq7 ; count7=uniq7=0}
        # if the 7th field is different to the previous 7th
        # field, increase the unique count
        $7 != last7 { uniq7++ }
        # increase the non-unique count, update values of
        # the 1st and 7th fields for use when processing
        # the subsequent record
        {count7++; last1=$1 ; last7=$7}
        # print out the last set of results when we reach
        # end of input
        END { print last1,count7,uniq7 }
' inputfile

Annihilannic.
 
Great - got it.

I think it's the fact that so much can be done in one go in awk that foxes me ... I'm used to having to do each thing separtly ... I guess I have a man's single-task brain rather than a woman's multi-task brain ... and I assume awk is female!

Cheers!

 
Hi

littleIdiot said:
A BIG thank you [gray](...)[/gray]

Thank you. [thumbsup]
Nice thanking, but here on Tek-Tips there is a different way to express your appreciation. Please click on the

[navy]Thank Annihilannic
for this valuable post!
[/navy]​

to let other people know that Annihilannic is helpful member ( although we already know this ) and also this thread has a good solution to the problem in the Subject.

Feherke.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top