Duplicate fields

hugheskbh · Jan 6, 2004

How can I identify records that have duplicate values in a field (position 2 thru 6). I need to indentify duplicate employee numbers in a file.

Thanks

Ken

PHV · Jan 6, 2004

Try something like this:

Code:

awk '{++a[substr($0,2,5)]}
END{for(i in a)if(a[i]>1)printf &quot;%s x %d\n&quot;,i,a[i]}
' /path/to/inputfile

You can also take a look at

Code:

 man sort

and

Code:

 man uniq

Hope This Help
PH.

vgersh99 · Jan 6, 2004

what does a sample record look like?

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

hugheskbh · Jan 6, 2004

An example will look like this:

File 1
A22333test1
A25555test2
A88888test3

File2
A33333test1
A54444test2
A22333test3

Note the duplicate 22333 in both files.

PHV · Jan 6, 2004

Try something like this:

Code:

awk '{
 k=substr($0,2,5);i=++a[k]
 b[k&quot;,&quot;i]=FILENAME&quot;:&quot;$0
}
END{
 for(k in a)if(a[k]>1)
  for(i=1;i<=a[k];++i)print b[k&quot;,&quot;i]
 printf &quot;\n&quot;
}' File1 File2

Hope This Help
PH.

Ygor · Jan 7, 2004

You could use a pattern file with grep...
[tt]
cut -c2-6 file1 >tmpfile
grep -f tmpfile file2

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Duplicate fields

hugheskbh

Programmer

PHV

MIS

vgersh99

Programmer

hugheskbh

Programmer

PHV

MIS

Ygor

Programmer

Similar threads

Part and Inventory Search

Sponsor