understanding check between two files 3

LAdProg2005 · Nov 18, 2009

I am using simple script to check difference in two files as bellow

Code:

#!/usr/bin/perl
open a, "$ARGV[0]";
open b, "$ARGV[1]";
local $/; my @a = split /\n/, <a>;
my @b = split /\n/, <b>;
my %b = map { $_ => 1 } @b;
# Make hash of B
my @res = grep { !defined $b{$_} } @a;
# Everything in A not in B
print join "\n", @res; print "\n";

File 1:
apple|1
banana|1
kiwi|3
File 2:
banana|2
kiwi|3

when running result is
apple|1 - removed in file 2
banana|1 - changed in file 2

by looking at the records you can't tell if the line was updated or deleted....

how do i differeciate between which record is deleted and which is changed? i can tell with the above data because it is couple lines , but for many lines it would not be good to eyeball...

thanks,
LAd

Kirsle · Nov 18, 2009

If this is any unix-flavored system (including OS X) you'll probably have a `diff` command... you can just use that.

Code:

diff -bu file1.txt file2.txt

It will tell you the differences between the files. The general usage is:

Code:

diff [options] <original file> <new file>

So if you were comparing source codes of two files, where file 1 is the original/old version and file 2 is a newer version, it would show + marks on lines that were added and - marks on lines that were deleted.

Kirsle.net | My personal homepage

Code:

perl -e '$|=$i=1;print" oo\n<|>\n_|_";x:sleep$|;print"\b",$i++%2?"/":"_";goto x;'

LAdProg2005 · Nov 19, 2009

well, the process will be automated as the files need to checked often hence they need to be put in script....

Kirsle · Nov 19, 2009

You can automate it via cron, or just make your perl script run `diff` via a system call.

Code:

my $diff_output = `diff -bu file1 file2`;
print $diff_output;

Kirsle.net | My personal homepage

Code:

perl -e '$|=$i=1;print" oo\n<|>\n_|_";x:sleep$|;print"\b",$i++%2?"/":"_";goto x;'

LAdProg2005 · Nov 20, 2009

Ok, so with that logic,

how do i process such
that i can say

if removed line do onething
if new line do secondthing
if updated line do thirdthing

prex1 · Nov 20, 2009

You should better specify your goal and the structure of your files. Do you want to execute a command for every record that's not the same in both? Or do you want to create a 3rd file with everything that's in both? Or one of the two is a master file and the second one should be modified with the values of the master? Or...
If both files are a collection of key-value pairs as in your example, one could create a single hash containing as values an array with the two values from each file; with this you can do what you want (but please be clear on what you want!).
Note BTW that the existence of a key in a hash should not be tested with [tt]defined[/tt], but with [tt]exists[/tt]

Franco

http://www.xcalcs.com

: Online engineering calculations

http://www.megamag.it

: Magnetic brakes for fun rides

http://www.levitans.com

: Air bearing pads

LAdProg2005 · Nov 23, 2009

You should better specify your goal and the structure of your files.

My goal is to compare two files and difference between records. file 1 is master and file 2 is to update the records. file 2 suggests that record is updated (record needs to be updated from master), removed (record is not needed hence not in file 2) or new record is added which means record doesn't exist in master file.

(but please be clear on what you want!).

Ok I try again to explain please
Bascially, i have two files with records. file 1 is older and file2 is newer/updated. (records doesn't have to be lenght of two though it could be three or four...with unique key of one column(in this case it is first column)

File 1:
apple|1
banana|1
kiwi|3

File 2:
banana|2
kiwi|3
strawberry|1

file 3: or local var that can be accessed later
->create a third file or logic that stores info
banana|2 was updated
kiwi|3 was ignored no changes to it
strawberry|1 was inserted
apple|1 was deleted

Annihilannic · Nov 23, 2009

If you have the GNU diff available, you can use format specifiers like this:

Code:

diff --old-line-format="%l was deleted%c'\012'" --new-line-format="%l was inserted%c'\012'" --unchanged-line-format="%l was ignored no changes to it%c'\012'" file1 file2

# or like this, if you prefer the format
diff --old-line-format="%l was deleted
" --new-line-format="%l was inserted
" --unchanged-line-format="%l was ignored no changes to it
" file1 file2

Annihilannic.

bichonfrise74 · Nov 23, 2009

I'm not sure why you do not want to use the diff function. Anyway, here's a crude way of doing what you want.

Code:

#!/usr/bin/perl

use strict;

my $file_1 =<<file_a;
apple|1
banana|1
kiwi|3
file_a

my $file_2 =<<file_b;
banana|2
kiwi|3
strawberry|1
file_b

my %record_1;
open( my $file, '<', \$file_1 ) or die "Error: Cannot open $file_1\n";
while (<$file>) {
    chomp;
    my ($key, $val) = split( /\|/ );
    $record_1{$key} = $val;
}
close( $file );

my %record_2;
open( my $file, '<', \$file_2 ) or die "Error: Cannot open $file_2\n";
while (my $line = <$file>) {
    chomp $line;
    my ($key, $val) = split( /\|/, $line );
    my $delete_ok;

    $record_2{$key} = $val;
    for my $i ( keys %record_1 ) {
        if ( $i eq $key && $record_1{$i} == $val ) {
            print "$line has not changed.\n";
            $delete_ok++;
        }
        if ( $i eq $key && $record_1{$i} != $val ) {
            print "$line was updated.\n";
            $delete_ok++;
        }
        if ( $delete_ok ) {
            delete $record_1{$i};
            delete $record_2{$i};

            $delete_ok = 0;
        }
    }

}
close( $file );

for my $i (keys %record_2) {
    grep { /\b$i\b/ } keys %record_1
      ? print "$i|$record_2{$i} was inserted.\n"
      : '';
}

for my $i (keys %record_1) {
    grep { /\b$i\b/ } keys %record_2
      ? print "$i|$record_1{$i} was deleted.\n"
      : '';
}

prex1 · Nov 23, 2009

Well...it seems to me that your master is file2, as you are updating file1 with the information contained in file2.
Anyway the logic I would use is as proposed by bichonfrise74 (and by me above), except that it can be simplified as follows:

Code:

use strict;
my$file_1 =<<file_a;
apple|1
banana|1
kiwi|3
file_a

my$file_2 =<<file_b;
banana|2
kiwi|3
strawberry|1
file_b

my(%result,$key,$val,$file);
open($file,'<',\$file_2)or die"Error: Cannot open $file_2\n";
while(<$file>){
  chomp;
  ($key,$val)=split(/\|/);
  $result{$key}=[$val,'was inserted'];
}
close($file);

open($file,'<',\$file_1)or die"Error: Cannot open $file_1\n";
while(<$file>){
  chomp;
  ($key,$val)=split(/\|/);
  if(exists$result{$key}){
    if($val==$result{$key}[0]){
      $result{$key}[1]='was ignored no changes to it';
    }else{
      $result{$key}[1]='was updated';
    }
  }else{
    $result{$key}=[$val,'was deleted'];
  }
}
for(sort keys%result){
  print"$_|@{$result{$_}}\n"; 
}

Franco

http://www.xcalcs.com

: Online engineering calculations

http://www.megamag.it

: Magnetic brakes for fun rides

http://www.levitans.com

: Air bearing pads

LAdProg2005 · Nov 24, 2009

Thank you both. Works very well. THanks for being patient with me.

Just one quick question...The logic wouldn't work if data was? What would I have to change to take that into consideration? I am thinking ($key,$val)=split(/\|/); would need another variable but comparison logic will have to be updated or no?

my$file_1 =<<file_a;
apple|1|s
banana|1|s
kiwi|3|s
file_a

my$file_2 =<<file_b;
banana|2|s
kiwi|3|b
strawberry|1|s
file_b

Thanks.

LAdProg2005 · Nov 24, 2009

also,thanks to all who made suggestions. best way to learn to look at multiple different ways of doing same thing. just that i am new and it takes me longer to figure out what is better way....

thanks again!!

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

understanding check between two files 3

LAdProg2005

Programmer

Kirsle

Programmer

LAdProg2005

Programmer

Kirsle

Programmer

LAdProg2005

Programmer

prex1

Programmer

LAdProg2005

Programmer

Annihilannic

MIS

bichonfrise74

Programmer

prex1

Programmer

LAdProg2005

Programmer

LAdProg2005

Programmer

Similar threads

Part and Inventory Search

Sponsor