Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

need help with awk parsing /etc/passwd

Status
Not open for further replies.

awkster

Technical User
Dec 11, 2004
1
US
I'm writing an awk program to parse an /etc/passwd file that contains records like:

jones::100:100:John Jones:/usr/bin/csh
fred::200:0:Fred Franklin:/usr/bin/csh
bob::300:100:Bob Smith:/usr/bin/csh
jjones::600:100:John Jones:/usr/bin/csh
admin::200:100:Fred Franklin:/usr/bin/csh
useless::200:0::/usr/bin/csh
...

From this file I created another file named "names" using cut and egrep that contains the 5th field from /etc/passwd and gets rid of blank lines, which looks like:

John Jones
Fred Franklin
Bob Smith
John Jones
Fred Franklin

Now, I'm trying to use awk to examine each line of the /etc/passwd file and calculate the following:
- total number of records in /etc/passwd
- % of records that are duplicates
- print the duplicate users (their full name) and how many entries in /etc/password each dupe user has.

So I'm using the "names" file to create an associative array called users_array, but I am completely stuck after that point. Here is what I have so far, any help would be greatly appreciated!! :

BEGIN {
while ( getline var<"names" ) {
users_array[var]=0;
print var;
print users_array[var];
}
}
# this is a lot harder than perl in my opinion
# for each name process each line in /etc/passwd

END {
print FILENAME
}
#print output
 
Code:
# If line isn't empty (field 1 exists), execute code.
$1 \
{ if ( $0 in names )
  { dup_names[ $0 ] = 1
    dups++
  }
  names[$0]++
  ## We could use NR for total number of records,
  ## but blank lines would throw it off.
  count++ 
}

END {
  print "Total records: " count
  printf "Duplicate records: %.1f%%\n", dups/count*100
  for ( name in dup_names )
    printf "%-20s%4d\n", name, names[name]
}

If you have nawk, use it instead of awk because on some systems awk is very old and lacks many useful features. For an introduction to Awk, see faq271-5564.

Let me know whether or not this helps.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top