compare the contents of 2 directories 3

derekJon · Dec 5, 2001

I want to compare 2 arrays (@DAT & @DIR - which represent an array of files in 2 different directories) and return an array of entries in @DAT that don't exist in @DIR and an array of entries in @DIR that don't exist in @DAT. I have written the following code which appears to work, however I'm new to PERL and like any hints/suggestions on how it might be improved - the array will arrays will be holding substantial amounts of data so performance is an issue:

my @DIR = ("one", "two", "three&quot

;
my @DAT = ("two", "three", "four&quot

;

for($cntDAT=0; $cntDAT < scalar(@DAT); $cntDAT++) {

for($cntDIR=0; $cntDIR < scalar(@DIR); $cntDIR++) {

if (@DIR[$cntDIR] eq @DAT[$cntDAT]) {

splice(@DIR, $cntDIR, 1);
$cntDIR = ($cntDIR - 1);

splice(@DAT, $cntDAT, 1);
$cntDAT = ($cntDAT - 1);

$cntDIR = 0;
last;
}
}
}

latch · Dec 5, 2001

The following is an alternative to your code . I have not 'benchmarked' it but this could be a bit faster.

[tt]
# Elements in @DIR NOT IN @DAT

@DIR = ("one", "two", "three&quot

;
@DAT = ("two", "three", "four&quot

;

%temp=();
@temp{@DIR}=() ;

foreach(@DAT)
{
delete $temp{$_};
}

@list=sort (keys %temp);
print "Elements in \@DIR not present in \@DAT : @list";
[/tt]

A temporary hash,%temp, is created and a key for every element in @DIR is made in %temp. After the line ,
[tt] @temp{@DIR}=();[/tt]
%temp would have the keys 'one' , 'two' and 'three';
We then delete the keys in %temp that are elements in @DAT;
@list has the elements in @DIR not present in @DAT

HTH
regards
C "Brahmaiva satyam"
-Adi Shankara (788-820 AD)

toolkit · Dec 6, 2001

Another way:

Code:

@DIR = qw( one two three );
@DAT = qw( two three four );

foreach( @DIR ) { $seen{$_}++ };
foreach( @DAT ) { $seen{$_}++ };

@intersect = grep { $seen{$_} == 2 } keys %seen;
@union = keys %seen;

print &quot;intersect: @intersect\n&quot;;
print &quot;union: @union\n&quot;;

Cheers, Neil

derekJon · Dec 6, 2001

Neil

I'm a bit of a newbie @ Perl so I'm not certain what your code is doing. When I ran it it returned:

intersect: three two union: one three two four

it should return:

one
four

any ideas?

toolkit · Dec 6, 2001

Sorry. Misunderstood your question. From set theory:
A union B: contains all elements of A and B
A intersection B: contains all elements in both A and B
What you want is actually the set difference between these two, namely:
(A union B) - (A intersection B)

The complete code:

Code:

     1  #!/usr/bin/perl -w
     2  @DIR = qw( one two three );
     3  @DAT = qw( two three four );
     4
     5  foreach( @DIR ) { $seen{$_}++ };
     6  foreach( @DAT ) { $seen{$_}++ };
     7
     8  @union     = keys %seen;
     9  @intersect = grep { $seen{$_} == 2 } keys %seen;
    10  @wanted    = grep { $seen{$_} == 1 } keys %seen;
    11
    12
    13  print &quot;intersect: @intersect\n&quot;;
    14  print &quot;union: @union\n&quot;;
    15  print &quot;wanted: @wanted\n&quot;;

Lines 2-3 are just another way of creating arrays of strings. See the [tt]perlfunc[/tt] manual pages for information on the [tt]qw[/tt] function.
Line 5 steps through each string in @DIR, and increments a value in the [tt]%seen[/tt] hash. So after this line, [tt]%seen[/tt] should look like:

Code:

{    &quot;one&quot; => 1,
     &quot;two&quot; => 1,
     &quot;three&quot; => 1
}

Line 6 does the same for @DAT. So after this line, [tt]%seen[/tt] should look like:

Code:

{    &quot;one&quot; => 1,
     &quot;two&quot; => 2,
     &quot;three&quot; => 2,
     &quot;four&quot; => 1
}

Line 8 just places all the keys of [tt]%seen[/tt] into the [tt]@union[/tt] array.
Line 9 uses [tt]grep[/tt] to filter the strings stored to [tt]@intersect[/tt] to those seen twice (once in [tt]@DIR[/tt], once in [tt]@DAT[/tt]).
Line 10 uses [tt]grep[/tt] to filter the strings stored to [tt]@wanted[/tt] to those seen only once (in either [tt]@DIR[/tt] or [tt]@DAT[/tt], but not both).
Hope this explains my code. Cheers, Neil

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

compare the contents of 2 directories 3

derekJon

Programmer

latch

Programmer

toolkit

Programmer

derekJon

Programmer

toolkit

Programmer

Similar threads

Part and Inventory Search

Sponsor