Compare Fields from two text files using key columns

Awksome · May 12, 2010

Hi All,

I have two files to compare. Each has 10 columns with first 4 columns being key index together. The rest of the columns have monetary values.

I want to read one file into hash; check for the key value availability in file 2; then compare the values in the rest of 6 columns; report the differences found.

The files are comman separated and do not have header

Here is the sample file:
File A:
Row1: abcd,abrd,fun,D000,$15,$236,$217,$200,$200,$200
Row2: dear,dare,tun,D000,$12.00405,$234.08976,$212.09876,$200,$200,$200

File B:
Row1: abcd,abrd,fun,D000,$12,$234,$212,$200,$200,$200
Row2: dear,dare,tun,D000,$12.00405,$234.08976,$212.09876,$200,$200,$200

Output:
Difference found for index abcd,abrd,fun,D000 for field 5,6 and 7

Any help would be appreciated. I am able to come up with the script in Bash, but not very comfortable with the concept of Hash in Perl and also setting up key index columns.

Thanks!

feherke · May 12, 2010

Hi

Like this ?

Code:

perl -naF, -e '$f=$ARGV unless$f;chomp(@F);$k=join",",@F[0..3];@v=@F[4..scalar@F];if($f eq$ARGV){$f{$k}=[@v]}else{if($f{$k}){@d=();for($i=0;$i<scalar@v;$i++){push@d,$i+5 if $v[$i]ne$f{$k}[$i]}print"Difference found for index $k for field ",join(", ",@d),"\n"if@d}};' /path/to/FileA /path/to/FileB

Feherke.

http://free.rootshell.be/~feherke/

PinkeyNBrain · May 12, 2010

If you're new to perl, the above may be a little hardcore. Here is a fairly brute force way to get the data read in.

Code:

open(FH, $first_file_path)
while ($inline = <FH>) {
   @inarr = split(/,/, $inline);
   chomp ($inarr[$#inarr]);  # Don't chomp $inline first
   @{$first_file_data{$inarr[0]}{$inarr[1]}{$inarr[2]}{$inarr[3]}} = @inarr[4..$#inarr] ;  # $#invar is equivalent to scaler@F above
};
close(FH);

Read in the second file in similar fashion. Note if you chomp your $inline first, data lines that look like:
a,b,c,d,e
f,g,h,i,
Will split somewhat differently. The first will return 5 values, the second will return 4. If you chomp the last value after the split, you'll get 5 values both times with the last one having a value of '' and/or null.

The first 'join' used in the first example is a more efficient way to set up the hash, but you loose some granularity when comparing it with other hashes. On the other hand, you'll have to do a little more work to compare the two with the hash of hashes presented here as in:
foreach $key1 (keys %first_file_data) {
foreach $key2 (keys %{$first_file_data{$key1}}) {
foreach $key3 (keys %{$first_file_data{$key1}{$key2}}) {
# and so on
}
}
}

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Compare Fields from two text files using key columns

Awksome

Programmer

feherke

Programmer

PinkeyNBrain

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor