Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Two Files Two Column Comparison

Status
Not open for further replies.

beaster

Technical User
Aug 20, 2001
225
US
I need help with a script to compare two files. I know the
command "diff" but I think it needs to be a bit more complex
to do what I want.

I have two files. They are in the same directory. They both have
similiar text in them.

1st File: file1
2nd File: file2

The newest file is file1 and has text like:

VA1419A 671 631 1622 151 43 360

VA1439A 537 481 1139 111 10 360

VA1433A 1160 1157 3703 412 10 360

VA1373B 338 323 978 82 10 360

VA0345B 948 947 4136 324 10 360

VA0337B 1821 1815 8802 811 41 360

The second file2 has text like:

VA1419A 671 631 1622 151 12 360

VA1437A 537 481 1139 111 10 360

VA0015A 1160 1157 3703 412 10 360

VA1900B 338 323 978 82 10 360

VA0343B 948 947 4136 324 10 360

VA0337B 1821 1815 8802 811 10 360

I need a small script to look for matching text in column 1 of each file.
File1 should be matched against file2 since it is the newest.

If it finds a match, then it needs to look at field 6 of both files for
the matching text.

If the number in field 6 of file1 is greater than 30 or more in field 6
of file2, I need it to send field 1 and 6 of file1 to > final_drops.txt

So by comparing the two examples above it would find and send the below
text to the file final_drops.txt

VA1419A 43
VA0337B 41

As a side note, the column 1 of each file may not have the same text always
like VA or otherwise.

Thanks for any assistance as always!
Beaster
 
Something like this ?
awk '
FNR==NR && NF>5{a[$1]=$6;next}
NF>5{if($6+0>a[$1]+30)print $1,$6}
' file2 file1 > final_drops.txt

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
It looks good, I have not tried it yet though.

Can you show me how to it should be written if I want to run it from the command line?

Like:
nawk -f file file
 
Beaster, after all the help we have given you over the years, you should be an awk expert. :) Just put the awk commands

FNR==NR && NF>5{a[$1]=$6;next}
NF>5{if($6+0>a[$1]+30)print $1,$6}

in a file, beaster.awk say, and run it by entering

awk -f beaster.awk file2 file1 > final_drops.txt



CaKiwi
 
I am having an issue running it like CaKiwi's example.

I think it is failing because it thinks the first field is too long.

vaossws03{root} #: nawk -f drop_nawk drops_temp3 testfile > final.txt
nawk: input record `VA0333C ...' too long
input record number 0, file testfile
source line number 1

Any ideas?
 
File 1 (Newest)

VA0333C 400 385 1547 150 13 360

VA1008C 88 82 522 42 8 360

VA0333B 340 325 1147 79 8 360

VA4350B 13 13 78 5 6 359

VA4350A 22 21 153 14 6 359

VA1437A 184 168 432 53 6 360

File 2 (Oldest)
VA0333C 400 385 1547 150 55 360

VA1008C 88 82 522 42 8 360

VA0333B 340 325 1147 79 8 360

VA4350B 13 13 78 5 6 359

VA4350A 22 21 153 14 36 359

VA1437A 184 168 432 53 6 360

VA1419A 285 276 511 96 6 360

VA1399C 137 126 278 39 6 360

 
My first guess is that there is something wrong with the end of line separator. Try

od -c drops_temp3

and post the first few lines. This will tell us if lines are ended with a linefeed.

CaKiwi
 
0000000 V A 0 3 3 3 C
0000020 4 0 0 3 8 5
0000040 1 5 4 7 1 5 0
0000060 1 3 3 6 0
0000100 \r \n V A 1 0 0 8 C
0000120 8 8
0000140 8 2 5 2 2
0000160 4 2 8 3
0000200 6 0 \r \n V A 0 3 3 3 B
0000220 3 4 0
0000240 3 2 5 1 1 4 7
0000260 7 9 8
0000300 3 6 0 \r \n V A 4 3 5 0 B
0000320 1 3
0000340 1 3 7 8
0000360 5 6
0000400 3 5 9 \r \n V A 4 3 5 0 A
0000420 2 2
0000440 2 1 1 5 3
0000460 1 4 6
0000500 3 5 9 \r \n V A 1 4 3 7
0000520 A 1
0000540 8 4 1 6 8 4
 
Seems OK.
And this ?
od -c testfile

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
The lines in the file are separated by a carriage return and a line feed which could be causing a problem. Try running dos2unix on it. Or maybe add

BEGIN { RS="\r\n" }

at the beginning of your awk program.

CaKiwi
 
CaKiwi,
I added the above like this to the nawk program

BEGIN { RS="\r\n" }
FNR==NR && NF>5{a[$1]=$6;next}
NF>5{if($6+0>a[$1]+30)print $1,$6}

and got this back

vaossws03{root} #: nawk -f drop_nawk drops_temp3 test1 > test2
nawk: input record `VA0333C ...' too long
input record number 0, file test1
source line number 3
 
It is now giving the error for source line 3 which indicates that the second file could be the problem. Post the output of

od -c testfile

and try running dos2unix on the both files

CaKiwi
 
0000000 V A 0 3 3 3 C
0000020 4 0 0 3 8 5
0000040 1 5 4 7 1 5 0
0000060 1 3 3 6 0
0000100 V A 1 0 0 8 C
0000120 8 8 8 2
0000140 5 2 2 4 2
0000160 8 3 6 0
0000200 V A 0 3 3 3 B
0000220 3 4 0 3 2 5
0000240 1 1 4 7 7 9
0000260 5 5 3 6 0
0000300 V A 4 3 5 0 B
0000320 1 3 1 3
0000340 7 8 5
0000360 6 3 5 9
0000400 V A 4 3 5 0 A
0000420 2 2 2 1
0000440 1 5 3 1 4
0000460 6 3 5 9
0000500 V A 1 4 3 7 A
0000520 1 8 4 1 6 8
0000540 4 3 2 5 3
0000560 6 3 6 0
0000600 V A 1 4 1 9 A
0000620 2 8 5 2 7 6
0000640 5 1 1 9 6
0000660 6 3 6 0


dos2unix
VA0333C 400 385 1547 150 13 360
VA1008C 88 82 522 42 8 360
VA0333B 340 325 1147 79 8 360
VA4350B 13 13 78 5 6 359
VA4350A 22 21 153 14 6 359
VA1437A 184 168 432 53 6 360
VA1419A 285 276 511 96 6 360
VA1399C 137 126 278 39 6 360
VA0060C 14 14 80 3 6 360
VA5338B 228 228 1116 99 5 359
 
Seems you have to play with the fold command on testfile.

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
Would it just be easier to cat each file before I try to do what I am looking to do with field 1 and 6 by just sending those two columns to a new file?

If so, how do I cat only those two fields to a new file?
 
Why are your 2 files so different in structure ?

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top