Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Intersection of values

Status
Not open for further replies.

rammalamma

Technical User
Jan 15, 2004
4
US
awk '
FNR==1 { FileCount++ }
{ rpm[$0]++ }
END { for (r in rpm) if (rpm[r] == FileCount) print r }
' file_*.txt


Can anyone explain this awk script to me. I have a directory of files, each file contains the rpms that are installed on a specific machine in the cluster. When I run this script it gives me one file with the intersection of the rpms that exist on all machines.

It's seems pretty simple but I have no idea how this script works. I would like to adjust it to also give me the union of all the rpms too.

FNR is ordinal number of the current record in the current file.

FileCount++ is being added each time through

rpm[$0]++ is an array, I don't know what $0 is.

and this line is a total mystery:
END { for (r in rpm) if (rpm[r] == FileCount) print r }

What is "r in rpm", what is r.



 
man awk

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Code:
awk '
        [gray]# Increment file count each time we hit the first record of an input[/gray]
        [gray]# file[/gray]
        [blue]FNR[/blue]==1 { FileCount++ }
        [gray]# Increment the the count of instances of this RPM found ($0 contains[/gray]
        [gray]# entire contents of input line)[/gray]
        { rpm[[blue]$0[/blue]]++ }
        [gray]# When all input files have been processed...[/gray]
        [green]END[/green] {
                [gray]# For each index of the rpm array[/gray]
                [olive]for[/olive] (r [olive]in[/olive] rpm) {
                        [gray]# If that RPM occurs as many times as there are input[/gray]
                        [gray]# files, print it.[/gray]
                        [olive]if[/olive] (rpm[r] == FileCount) [b]print[/b] r
                }
        }
' file_*.txt

In awk, arrays can be indexed by strings. These are usually called "hashes" rather than "arrays" in other languages, such as perl. So the rpm[] array is an array of counts indexed by the RPM names, and for (r in rpm) iterates through those indices.

Annihilannic.
 
Thank you very much!!!

My brain doesn't hurt so much when I look at it now. :)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top