Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

remove duplicate lines with awk

Status
Not open for further replies.

uksnowman

Technical User
Apr 2, 2001
4
CH
Hi awkers.

I have written a nawk script to parse a large log file (thousands of lines). I now have the log in the correct format and need to remove all duplicate lines, but I would prefer to do this using an awk function.

Is it possible to do this using awk? I know I can use uniq to do the same thing but it would mess up some of the post processing I have done.

Thanks,
uksnowman
 
Hi, uksnowman!

You can try this awk command to remove duplicate lines from file:

awk '{ line[$0] ++; if (line[$0] < 2) print }' inputfile > newfile

This little awk script works good with small files: my test file had lines 559 characters long and it works. You can send me a message about your experience.

KP.
 
Hi all!
This is my first post to the forum.
My script below compares each input line with the contents of the output file (&quot;new.log&quot;) using getline command. I believe it should work with large files but I have not tried.
-----------------------------------
#deldup.awk
{
  dupexists = 0
  tmp = $0
  while ( ((getline < &quot;new.log&quot;) > 0) && (dupexists == 0) ) {
    if (tmp == $0) {
      dupexists = 1
    }
  }
  close (&quot;new.log&quot;)
  if (dupexists == 0) {
    if (NR == 1) {print tmp > &quot;new.log&quot;} # start fresh
    else {print tmp >> &quot;new.log&quot;} # append
    close (&quot;new.log&quot;)
  }
}
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top