Match fields from different files

rswarich · Jun 7, 2001

Have two files, each with the following format:
Example data in File1:
176, 488, 14, 475, 167, 497, 482, 617, 491, 168, 483
215, 106, 14, 276, 488, 105, 298, 299, 498, 497, 714
.
.
.

Example data in File2:
216, 475, 276, 14, 488, 601, 298, 482, 617, 714, 497
25, 488, 475, 476, 167, 617, 485, 616, 491, 483, 480
.
.
.

File1 might have up to 1000 lines and File2 might have
100000 lines.

Need to compare fields 2-11 for each line in File1 to fields 2-11 for each line in File2. The numbers in the fields will not be in increasing order.

Every time there are at least 4 matches in fields 2-11
on a line in File2 compared with a line in File1, print field 1 from File2 on a separate line in a new file (Newfile).

In the above example, lines 216 and 25 from File2 each have at least 4 matches with line 176 in File1 so field 1 (216 and 25) from File2 get printed each time.
Now compare the 215 line from File1 with each line in File2.
In File2, 216 has at least 4 matches so print field 1 (216), but line 25 has less than 4 matches, so print nothing.
The resulting printout in Newfile would be:
216,
25,
216,
Ideally would also like to eliminate all duplicates in Newfile (like the extra 216) if possible-ok if that is a separate step.

Any suggestions/help on how to do this in UNIX/awk/etc. would be appreciated.

flogrr · Jun 11, 2001

Hi rswarich-

The first test "print" statement should not be used
when running the 100,000 line file for obvious
reasons!

Run this program on your sample files of 2 lines
each with all commented lines un-commented,
then comment any or all of them and run your
big files.

#!/bin/sh

awk '

FILENAME == "file2" {

FS=OFS=","

while (getline line <"file1" > 0 ) {

while (getline <"file2" > 0 ) {

split (line,fld,",&quot

for ( i=2; i <= 11; i++ ) {
for ( j=2; j <= 11; j++ ) {
# print fld[j]" , "$i # for testing only- comment out -or- delete
if ( $i ~ fld[j] ) k++
}
}

if ( k >= 4 ) {
if (data[$1]++ == 0 )
lines[++count] = $1
}
}

if ( NR == 1 ) print ""

if (getline <"file2" <= 0 )
close ("file2&quot

}
}
END {

for ( x = 1; x <= count; x++ )
print lines[x]
print ""
# print "Matched fields = "k # for testing only- comment out -or- delete
# print "Matched records = "count # for testing only- comment out -or- delete

}' file2 > Newfile

Hope this helps you!

flogrr
flogr@yahoo.com

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Match fields from different files

rswarich

Technical User

flogrr

Programmer

Similar threads

Part and Inventory Search

Sponsor