Hi, maybe one can help?
I have a file (about 1MB in size) that contains result of an grep script. This grep script ran about several thousands of textfiles, get special lines out of them and put them with their file names into an output file:
grep -F -R "blabla" * |grep -v noblabla >outfile
So now this file contains lines like
/dir1/dir2/test1.txt anything blabla - 1234
Next I sorted it with
sort -k5 <outfile >out1file
while the number 1234 changes often
Now my problem:
The numbers in field 5 changes but there are several numbers that exists more than once and in different files, eg the file contains
/dir1/dir2/test1.txt anything blabla - 789
/dir1/dir3/test21.txt anything blabla - 789
Now I need a listing which of those numbers exists more than once and in which files they are. Because I'm absolutely sure that always the number field is at position 5 when using space as a field delimiter it would be better to use the minus as separator, because this is always only once in the file.
My first guess was to get the numbers line by line from the file and search them in the file, but because I'm very new in awk scripting I couldnt get it working.
Any help?
I have a file (about 1MB in size) that contains result of an grep script. This grep script ran about several thousands of textfiles, get special lines out of them and put them with their file names into an output file:
grep -F -R "blabla" * |grep -v noblabla >outfile
So now this file contains lines like
/dir1/dir2/test1.txt anything blabla - 1234
Next I sorted it with
sort -k5 <outfile >out1file
while the number 1234 changes often
Now my problem:
The numbers in field 5 changes but there are several numbers that exists more than once and in different files, eg the file contains
/dir1/dir2/test1.txt anything blabla - 789
/dir1/dir3/test21.txt anything blabla - 789
Now I need a listing which of those numbers exists more than once and in which files they are. Because I'm absolutely sure that always the number field is at position 5 when using space as a field delimiter it would be better to use the minus as separator, because this is always only once in the file.
My first guess was to get the numbers line by line from the file and search them in the file, but because I'm very new in awk scripting I couldnt get it working.
Any help?