Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations derfloh on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Duplicate fields

Status
Not open for further replies.

hugheskbh

Programmer
Dec 18, 2002
37
US
How can I identify records that have duplicate values in a field (position 2 thru 6). I need to indentify duplicate employee numbers in a file.

Thanks

Ken
 
Try something like this:
Code:
awk '{++a[substr($0,2,5)]}
END{for(i in a)if(a[i]>1)printf "%s x %d\n",i,a[i]}
' /path/to/inputfile
You can also take a look at
Code:
 man sort
and
Code:
 man uniq

Hope This Help
PH.
 
what does a sample record look like?

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
An example will look like this:

File 1
A22333test1
A25555test2
A88888test3


File2
A33333test1
A54444test2
A22333test3

Note the duplicate 22333 in both files.
 
Try something like this:
Code:
awk '{
 k=substr($0,2,5);i=++a[k]
 b[k&quot;,&quot;i]=FILENAME&quot;:&quot;$0
}
END{
 for(k in a)if(a[k]>1)
  for(i=1;i<=a[k];++i)print b[k&quot;,&quot;i]
 printf &quot;\n&quot;
}' File1 File2


Hope This Help
PH.
 
You could use a pattern file with grep...
[tt]
cut -c2-6 file1 >tmpfile
grep -f tmpfile file2
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top