Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

compare the contents of 2 directories 3

Status
Not open for further replies.

derekJon

Programmer
Dec 5, 2001
7
0
0
AU
I want to compare 2 arrays (@DAT & @DIR - which represent an array of files in 2 different directories) and return an array of entries in @DAT that don't exist in @DIR and an array of entries in @DIR that don't exist in @DAT. I have written the following code which appears to work, however I'm new to PERL and like any hints/suggestions on how it might be improved - the array will arrays will be holding substantial amounts of data so performance is an issue:

my @DIR = ("one", "two", "three");
my @DAT = ("two", "three", "four");


for($cntDAT=0; $cntDAT < scalar(@DAT); $cntDAT++) {

for($cntDIR=0; $cntDIR < scalar(@DIR); $cntDIR++) {

if (@DIR[$cntDIR] eq @DAT[$cntDAT]) {

splice(@DIR, $cntDIR, 1);
$cntDIR = ($cntDIR - 1);

splice(@DAT, $cntDAT, 1);
$cntDAT = ($cntDAT - 1);

$cntDIR = 0;
last;
}
}
}
 
The following is an alternative to your code . I have not 'benchmarked' it but this could be a bit faster.




[tt]
# Elements in @DIR NOT IN @DAT

@DIR = (&quot;one&quot;, &quot;two&quot;, &quot;three&quot;);
@DAT = (&quot;two&quot;, &quot;three&quot;, &quot;four&quot;);

%temp=();
@temp{@DIR}=() ;

foreach(@DAT)
{
delete $temp{$_};
}

@list=sort (keys %temp);
print &quot;Elements in \@DIR not present in \@DAT : @list&quot;;
[/tt]







A temporary hash,%temp, is created and a key for every element in @DIR is made in %temp. After the line ,
[tt] @temp{@DIR}=();[/tt]
%temp would have the keys 'one' , 'two' and 'three';
We then delete the keys in %temp that are elements in @DAT;
@list has the elements in @DIR not present in @DAT


HTH
regards
C &quot;Brahmaiva satyam&quot;
-Adi Shankara (788-820 AD)
 
Another way:
Code:
@DIR = qw( one two three );
@DAT = qw( two three four );

foreach( @DIR ) { $seen{$_}++ };
foreach( @DAT ) { $seen{$_}++ };

@intersect = grep { $seen{$_} == 2 } keys %seen;
@union = keys %seen;

print &quot;intersect: @intersect\n&quot;;
print &quot;union: @union\n&quot;;
Cheers, Neil
 
Neil

I'm a bit of a newbie @ Perl so I'm not certain what your code is doing. When I ran it it returned:

intersect: three two union: one three two four

it should return:

one
four

any ideas?
 
Sorry. Misunderstood your question. From set theory:
A union B: contains all elements of A and B
A intersection B: contains all elements in both A and B
What you want is actually the set difference between these two, namely:
(A union B) - (A intersection B)

The complete code:
Code:
     1  #!/usr/bin/perl -w
     2  @DIR = qw( one two three );
     3  @DAT = qw( two three four );
     4
     5  foreach( @DIR ) { $seen{$_}++ };
     6  foreach( @DAT ) { $seen{$_}++ };
     7
     8  @union     = keys %seen;
     9  @intersect = grep { $seen{$_} == 2 } keys %seen;
    10  @wanted    = grep { $seen{$_} == 1 } keys %seen;
    11
    12
    13  print &quot;intersect: @intersect\n&quot;;
    14  print &quot;union: @union\n&quot;;
    15  print &quot;wanted: @wanted\n&quot;;
Lines 2-3 are just another way of creating arrays of strings. See the [tt]perlfunc[/tt] manual pages for information on the [tt]qw[/tt] function.
Line 5 steps through each string in @DIR, and increments a value in the [tt]%seen[/tt] hash. So after this line, [tt]%seen[/tt] should look like:
Code:
{    &quot;one&quot; => 1,
     &quot;two&quot; => 1,
     &quot;three&quot; => 1
}
Line 6 does the same for @DAT. So after this line, [tt]%seen[/tt] should look like:
Code:
{    &quot;one&quot; => 1,
     &quot;two&quot; => 2,
     &quot;three&quot; => 2,
     &quot;four&quot; => 1
}
Line 8 just places all the keys of [tt]%seen[/tt] into the [tt]@union[/tt] array.
Line 9 uses [tt]grep[/tt] to filter the strings stored to [tt]@intersect[/tt] to those seen twice (once in [tt]@DIR[/tt], once in [tt]@DAT[/tt]).
Line 10 uses [tt]grep[/tt] to filter the strings stored to [tt]@wanted[/tt] to those seen only once (in either [tt]@DIR[/tt] or [tt]@DAT[/tt], but not both).
Hope this explains my code. Cheers, Neil
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top