find and list duplicate lines in file 1

aschneck · Jan 14, 2003

Hi, maybe one can help?
I have a file (about 1MB in size) that contains result of an grep script. This grep script ran about several thousands of textfiles, get special lines out of them and put them with their file names into an output file:
grep -F -R "blabla" * |grep -v noblabla >outfile

So now this file contains lines like
/dir1/dir2/test1.txt anything blabla - 1234

Next I sorted it with
sort -k5 <outfile >out1file
while the number 1234 changes often

Now my problem:
The numbers in field 5 changes but there are several numbers that exists more than once and in different files, eg the file contains

/dir1/dir2/test1.txt anything blabla - 789
/dir1/dir3/test21.txt anything blabla - 789

Now I need a listing which of those numbers exists more than once and in which files they are. Because I'm absolutely sure that always the number field is at position 5 when using space as a field delimiter it would be better to use the minus as separator, because this is always only once in the file.

My first guess was to get the numbers line by line from the file and search them in the file, but because I'm very new in awk scripting I couldnt get it working.
Any help?

vgersh99 · Jan 14, 2003

no need for awk.

sort -u
or

sort whateverSwitches | uniq vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

CaKiwi · Jan 14, 2003

Vlad,

The poster wants to list duplicate lines, not eliminate duplicates.

There may be an easier way but here's my solution. Take out the second print if you only want to list the lines once.

Code:

BEGIN { FS=&quot;-&quot;}
{
  if ($2 == sv2) {
   if (!flg) {
     print sv0
     flg = 1
   }
   print
  }
  else {
    flg = 0
    sv0 = $0
    sv2 = $2
  }
}
~

CaKiwi

aschneck · Jan 14, 2003

Hi CaKiwi,

just tried it, looks very good.
Thanks a lot. OK, right now, 'cause I'm new in awk I dont understand the functions/parameters (sigh...) like flg and sv0, so I have to read some docu because I dont like using things that I dont understand. ;-)
But it's working.
Thanks,
Axel

vgersh99 · Jan 14, 2003

sorry - I've misread the post. vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

CaKiwi · Jan 14, 2003

flg and sv0 are just variables. flg is used to control the printing of the first duplicate line and sv0 is used to save the current line so the when a duplicate is found, the first duplicate line can be printed out. CaKiwi

aschneck · Jan 14, 2003

OK, so variables inside of awk dont need a $ sign or else, that was what I wondered about.
thanks again,
Axel

marsd · Jan 14, 2003

"OK, so variables inside of awk dont need a $ sign.."

No, the language is syntactically modeled after c and is,
IMO, a breath of fresh air in that regard compared to,
perl, the shell, etc..

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

find and list duplicate lines in file 1

aschneck

IS-IT--Management

vgersh99

Programmer

CaKiwi

Programmer

aschneck

IS-IT--Management

vgersh99

Programmer

CaKiwi

Programmer

aschneck

IS-IT--Management

marsd

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor