Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

find and list duplicate lines in file 1

Status
Not open for further replies.

aschneck

IS-IT--Management
Oct 31, 2002
12
DE
Hi, maybe one can help?
I have a file (about 1MB in size) that contains result of an grep script. This grep script ran about several thousands of textfiles, get special lines out of them and put them with their file names into an output file:
grep -F -R "blabla" * |grep -v noblabla >outfile

So now this file contains lines like
/dir1/dir2/test1.txt anything blabla - 1234

Next I sorted it with
sort -k5 <outfile >out1file
while the number 1234 changes often

Now my problem:
The numbers in field 5 changes but there are several numbers that exists more than once and in different files, eg the file contains

/dir1/dir2/test1.txt anything blabla - 789
/dir1/dir3/test21.txt anything blabla - 789

Now I need a listing which of those numbers exists more than once and in which files they are. Because I'm absolutely sure that always the number field is at position 5 when using space as a field delimiter it would be better to use the minus as separator, because this is always only once in the file.

My first guess was to get the numbers line by line from the file and search them in the file, but because I'm very new in awk scripting I couldnt get it working.
Any help?
 
no need for awk.

sort -u
or

sort whateverSwitches | uniq vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Vlad,

The poster wants to list duplicate lines, not eliminate duplicates.

There may be an easier way but here's my solution. Take out the second print if you only want to list the lines once.
Code:
BEGIN { FS=&quot;-&quot;}
{
  if ($2 == sv2) {
   if (!flg) {
     print sv0
     flg = 1
   }
   print
  }
  else {
    flg = 0
    sv0 = $0
    sv2 = $2
  }
}
~
CaKiwi
 
Hi CaKiwi,

just tried it, looks very good.
Thanks a lot. OK, right now, 'cause I'm new in awk I dont understand the functions/parameters (sigh...) like flg and sv0, so I have to read some docu because I dont like using things that I dont understand. ;-)
But it's working.
Thanks,
Axel
 
sorry - I've misread the post. vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 

flg and sv0 are just variables. flg is used to control the printing of the first duplicate line and sv0 is used to save the current line so the when a duplicate is found, the first duplicate line can be printed out. CaKiwi
 
OK, so variables inside of awk dont need a $ sign or else, that was what I wondered about.
thanks again,
Axel
 
&quot;OK, so variables inside of awk dont need a $ sign..&quot;

No, the language is syntactically modeled after c and is,
IMO, a breath of fresh air in that regard compared to,
perl, the shell, etc..
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top