Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Compare Fields from two text files using key columns

Status
Not open for further replies.

Awksome

Programmer
Jul 17, 2009
11
US
Hi All,

I have two files to compare. Each has 10 columns with first 4 columns being key index together. The rest of the columns have monetary values.

I want to read one file into hash; check for the key value availability in file 2; then compare the values in the rest of 6 columns; report the differences found.

The files are comman separated and do not have header

Here is the sample file:
File A:
Row1: abcd,abrd,fun,D000,$15,$236,$217,$200,$200,$200
Row2: dear,dare,tun,D000,$12.00405,$234.08976,$212.09876,$200,$200,$200

File B:
Row1: abcd,abrd,fun,D000,$12,$234,$212,$200,$200,$200
Row2: dear,dare,tun,D000,$12.00405,$234.08976,$212.09876,$200,$200,$200

Output:
Difference found for index abcd,abrd,fun,D000 for field 5,6 and 7

Any help would be appreciated. I am able to come up with the script in Bash, but not very comfortable with the concept of Hash in Perl and also setting up key index columns.

Thanks!
 
Hi

Like this ?
Code:
perl -naF, -e '$f=$ARGV unless$f;chomp(@F);$k=join",",@F[0..3];@v=@F[4..scalar@F];if($f eq$ARGV){$f{$k}=[@v]}else{if($f{$k}){@d=();for($i=0;$i<scalar@v;$i++){push@d,$i+5 if $v[$i]ne$f{$k}[$i]}print"Difference found for index $k for field ",join(", ",@d),"\n"if@d}};' /path/to/FileA /path/to/FileB

Feherke.
 
If you're new to perl, the above may be a little hardcore. Here is a fairly brute force way to get the data read in.
Code:
open(FH, $first_file_path)
while ($inline = <FH>) {
   @inarr = split(/,/, $inline);
   chomp ($inarr[$#inarr]);  # Don't chomp $inline first
   @{$first_file_data{$inarr[0]}{$inarr[1]}{$inarr[2]}{$inarr[3]}} = @inarr[4..$#inarr] ;  # $#invar is equivalent to scaler@F above
};
close(FH);

Read in the second file in similar fashion. Note if you chomp your $inline first, data lines that look like:
a,b,c,d,e
f,g,h,i,
Will split somewhat differently. The first will return 5 values, the second will return 4. If you chomp the last value after the split, you'll get 5 values both times with the last one having a value of '' and/or null.

The first 'join' used in the first example is a more efficient way to set up the hash, but you loose some granularity when comparing it with other hashes. On the other hand, you'll have to do a little more work to compare the two with the hash of hashes presented here as in:
foreach $key1 (keys %first_file_data) {
foreach $key2 (keys %{$first_file_data{$key1}}) {
foreach $key3 (keys %{$first_file_data{$key1}{$key2}}) {
# and so on
}
}
}
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top