Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Match fields from different files

Status
Not open for further replies.

rswarich

Technical User
Jun 7, 2001
7
US
Have two files, each with the following format:
Example data in File1:
176, 488, 14, 475, 167, 497, 482, 617, 491, 168, 483
215, 106, 14, 276, 488, 105, 298, 299, 498, 497, 714
.
.
.

Example data in File2:
216, 475, 276, 14, 488, 601, 298, 482, 617, 714, 497
25, 488, 475, 476, 167, 617, 485, 616, 491, 483, 480
.
.
.

File1 might have up to 1000 lines and File2 might have
100000 lines.

Need to compare fields 2-11 for each line in File1 to fields 2-11 for each line in File2. The numbers in the fields will not be in increasing order.

Every time there are at least 4 matches in fields 2-11
on a line in File2 compared with a line in File1, print field 1 from File2 on a separate line in a new file (Newfile).

In the above example, lines 216 and 25 from File2 each have at least 4 matches with line 176 in File1 so field 1 (216 and 25) from File2 get printed each time.
Now compare the 215 line from File1 with each line in File2.
In File2, 216 has at least 4 matches so print field 1 (216), but line 25 has less than 4 matches, so print nothing.
The resulting printout in Newfile would be:
216,
25,
216,
Ideally would also like to eliminate all duplicates in Newfile (like the extra 216) if possible-ok if that is a separate step.

Any suggestions/help on how to do this in UNIX/awk/etc. would be appreciated.
 
Hi rswarich-

The first test "print" statement should not be used
when running the 100,000 line file for obvious
reasons!

Run this program on your sample files of 2 lines
each with all commented lines un-commented,
then comment any or all of them and run your
big files.

#!/bin/sh

awk '

FILENAME == "file2" {

FS=OFS=","

while (getline line <&quot;file1&quot; > 0 ) {

while (getline <&quot;file2&quot; > 0 ) {

split (line,fld,&quot;,&quot;)

for ( i=2; i <= 11; i++ ) {
for ( j=2; j <= 11; j++ ) {
# print fld[j]&quot; , &quot;$i # for testing only- comment out -or- delete
if ( $i ~ fld[j] ) k++
}
}

if ( k >= 4 ) {
if (data[$1]++ == 0 )
lines[++count] = $1
}
}

if ( NR == 1 ) print &quot;&quot;

if (getline <&quot;file2&quot; <= 0 )
close (&quot;file2&quot;)
}
}
END {

for ( x = 1; x <= count; x++ )
print lines[x]
print &quot;&quot;
# print &quot;Matched fields = &quot;k # for testing only- comment out -or- delete
# print &quot;Matched records = &quot;count # for testing only- comment out -or- delete

}' file2 > Newfile


Hope this helps you!



flogrr
flogr@yahoo.com

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top