remove duplicate lines with awk

uksnowman · Apr 12, 2001

Hi awkers.

I have written a nawk script to parse a large log file (thousands of lines). I now have the log in the correct format and need to remove all duplicate lines, but I would prefer to do this using an awk function.

Is it possible to do this using awk? I know I can use uniq to do the same thing but it would mess up some of the post processing I have done.

Thanks,
uksnowman

Krunek · Apr 12, 2001

Hi, uksnowman!

You can try this awk command to remove duplicate lines from file:

awk '{ line[$0] ++; if (line[$0] < 2) print }' inputfile > newfile

This little awk script works good with small files: my test file had lines 559 characters long and it works. You can send me a message about your experience.

KP.

wansu · Apr 13, 2001

Hi all!
This is my first post to the forum.
My script below compares each input line with the contents of the output file ("new.log&quot

using getline command. I believe it should work with large files but I have not tried.
-----------------------------------
#deldup.awk
{
  dupexists = 0
  tmp = $0
  while ( ((getline < "new.log&quot

> 0) && (dupexists == 0) ) {
    if (tmp == $0) {
      dupexists = 1
    }
  }
  close ("new.log&quot

  if (dupexists == 0) {
    if (NR == 1) {print tmp > "new.log"} # start fresh
    else {print tmp >> "new.log"} # append
    close ("new.log&quot

}
}

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

remove duplicate lines with awk

uksnowman

Technical User

Krunek

Programmer

wansu

Technical User

Similar threads

Part and Inventory Search

Sponsor