Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Grouping and count 1

Status
Not open for further replies.

malpa

Technical User
Feb 8, 2004
122
CO
Hi

I have this file

file.txt

source target times
A B 1
A B 2
A C 4
B A 3
B K 1
B C 8
B D 2
B D 1
C A 1

desired output file

source, different targets, total times
A 2 (B,C) 7
B 4 (A,K,C,D) 15
C 1 (A) 1


Thanks malpa
 
I know this isn't your first encounter with awk, malpa, so what have you tried so far?

Annihilannic.
 
Hi Annihilannic

You are rigth, but I have some problems with the arrays.

This is my initial program. I want to do this only with arrays, but I don´t know how.

awk ' BEGIN{}
{
source=$1
target=$2
times=$3
S[source]+=1
T[target]+=1
S_D[source","target]++
R[source","S_D[source","target]]++
D[source","S_D[source","target]]+=times

}
END{
for ( i in R)
print i","R","D

} ' file.txt



however, I solved it. This is my second programm. But this programm don´t like.

#A B 1
#A B 2
#A C 4
#B A 3
#B K 1
#B C 8
#B D 2
#B D 1
#C A 1

awk ' BEGIN{FS=" +"}
{
z[$1]=z[$1]"|"$2
t[$1]+=$3
}
END{ for ( k in z){
split(z[k],array,"|")
n=asort(array)
for ( j = 2; j<=n; j++ ){
h[array[j]]++
}
l=asort(h,y)
printf "%s,%s (",k,l
for ( i in h) printf"%s", i
printf "),%s\n",t[k]
delete h
}
} ' file.txt


output

A,2 (BC),7
B,4 (ACDK),15
C,1 (A),1


I appreciate if you give me some tips to do this with arrays.

Thanks malpa
 
Unfortunately two-dimensional arrays are not very well implemented in awk and it is difficult to iterate through them using a for (index in array) {} construct. Although I'm not sure that's exactly what you were trying to do... however they can still be useful for this problem.

Code:
awk '
        NR>1 {
                source=$1
                target=$2
                times=$3
                if (!S_D[source,target]) {
                        # new target
                        R[source]=R[source] ( T[source] ? "," : "" ) target
                        T[source]++
                }
                D[source]+=times
                S_D[source,target]+=times
        }
        END {
                for (i in D)
                        print i,T[i]" ("R[i]")",D[i]
        }
' file.txt

I'm not sure I've used the arrays in the same way you have (personally I would give them more descriptive names!), but S_D keeps track of source and target combinations already found, T keeps track of the count of unique targets for each source, R contains the human-readable list of different targets, and D contains the count from each source. D is also used as the basis to iterate over for the final printout. The fancy ? "," : "" stuff is just to append a comma when a target has already been found so you don't end up with extra commas.

Annihilannic.
 
Hi

Annihilannic

Yes Sr, you are right. I have to improve the way to write this programs and I have to learn more about awk.

your comments are rigth.

Thanks for your comments.

If it is possible, could you send to me a link or document related with arrays in awk, or examples with arrays in awk.


Iwill appreciate it.


Thanks
Malpa
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top