Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

compare two arrays 4

Status
Not open for further replies.

nfaber

Technical User
Oct 22, 2001
446
US
Hello all,

I need some code to find the difference between two arrays and put the diff in another array. I did a search on this forum and found some code from raklet and modified it. Here is my code. The output shows that @clockonly exists, but there seems to be nothing in it:

Code:
my %seen = ();
my @clockonly = ():

foreach $item (@probtckt, @clktckt) {seen{$item}++;}

foreach $element (keys %seen) {

   if ($seen{$item} < 1) {
         push (@clockonly, $seen{$item});
   }

Any help is appriciated (as always)

Nick
 
Nick,

have you sample data?

Code:
my %seen;
my @clockonly;

push (@probtcket, @clktckt);  #only one array @probtckt

foreach (@probtckt) {
  $seen{$_}++;
)

foreach (keys %seen) {
  push (@clockonly, $seen{$_});
}

Just seen it, foreach element, push $item

HTH
--Paul

cigless ...
 
Thanks for the reply Paul. I tried you code an no joy.

@clockonly is populated with thousands of "1" s. The data looks like this:

@clktckt = (IM0001, IM0002, IM0004, IM0005);

@probtckt = (IM0001, IM0006, IM0007, IM0008);

what I want @clockonly to contain is every entry that is in @clktckt but not in @prontckt. In this case:

@clockonly will contain (IM0002, IM0004, IM0005).

Thanks,

Nick

 
Change this line:

Code:
foreach (keys %seen) {
  push (@clockonly, $seen{$_});
}
to this:
Code:
foreach (keys %seen) {
  push (@clockonly, $_); #push the Key to the array
}
 
Aye,

I was writing it, and I didn't bother testing after I copped that that $element and $item were being confused in the original post

My Bad ;-)

--Paul

cigless ...
 
There is a module for doing this kind of stuff,


but you could do it manually something like this:

Code:
my @clktckt = qw(IM0001 IM0002 IM0004 IM0005);
my @probtckt = qw(IM0001 IM0006 IM0007 IM0008);

my @clockonly;

LOOP: foreach(@clktckt) {
   for my $i (0 .. $#probtckt) {
      next LOOP if ($_ eq $probtckt[$i]);
   }
   push @clockonly,$_;
}

print "@clockonly";
 
Kevin,

Just my 0.02 here but I think the hash is the nicest way because it only requires the hash to be loaded, and then process both arrays against it, as opposed to having to go through the 50% of entries in the first array for every entry in the second array.

If the arrays are fairly large, this could take quite a while to process (thousands of 1's in the array)

--Paul
 
Paul,

I agree, but the code you and Rieekan have posted looks like it will not do what nfaber requested:

what I want @clockonly to contain is every entry that is in @clktckt but not in @prontckt. In this case:
@clockonly will contain (IM0002, IM0004, IM0005).

Code:
my @clktckt = qw(IM0001 IM0002 IM0004 IM0005);
my @probtckt = qw(IM0001 IM0006 IM0007 IM0008);

my %seen;
my @clockonly;

push (@probtckt, @clktckt);  #only one array @probtckt

foreach (@probtckt) {
  $seen{$_}++;
}

foreach (keys %seen) {
  push (@clockonly, $_); #push the Key to the array
}

print "@clockonly";

prints:

IM0001 IM0008 IM0006 IM0002 IM0004 IM0007 IM0005

which is the unique elements of both arrays. With a little bit of tweaking the code the hash will work though. :)
 
Thanks for the replies folks, Kevin seems to be right. There will be thousands of entries in each array and I seem to be getting a merge rather than a difference. I should be getting less than 100 that exist in @clktckt and not in @probtckt.

Nick
 
Code:
my %seen;
my @clockonly;
@clktckt = (IM0001, IM0002, IM0004, IM0005);
@probtckt = (IM0001, IM0006, IM0007, IM0008);

foreach (@clktckt) {
  $seen{$_}=0;
}

foreach (@probtckt) {
  if (exists $seen{$_}) {
     $seen{$_}++;
  }
}
foreach (keys %seen) {
  if ($seen{$_} == 0) {
    push (@clockonly, $_);
  }
}

foreach (@clockonly) {
  print "$_\n";
}

Still using a hash
--Paul

cigless ...
 
Thanks for all the replies folks. Stars for most. Paul, I will test your code tomorrow, but first, it seems as though there is nothing in one of my arrays and I need to troubleshoot that first.

As always,

Nick
 
My results
Code:
IM0002
IM0004
IM0005
--Paul

cigless ...
 
Looks good Paul, I wonder what the difference would be between a real world test using arrays like I did, and yours only using hashes. Might be interesting to run a test.
 
scr3w tests, let's benchmark ... ;-)

[aside]that could look rude, apologies[/aside]
--Paul

BTW I'm using an array as well, just not over and over

cigless ...
 
no offense taken. By test I meant benchmark which I thought you would understand, and you did. [2thumbsup]
 
From the Perl Cookbook:

4.8. Finding Elements in One Array but Not Another
4.8.1. Problem

You want to find elements that are in one array but not another.
4.8.2. Solution

You want to find elements in @A that aren't in @B. Build a hash of the keys of @B to use as a lookup table. Then check each element in @A to see whether it is in @B.
4.8.2.1. Straightforward implementation

# assume @A and @B are already loaded
%seen = ( ); # lookup table to test membership of B
@aonly = ( ); # answer

# build lookup table
foreach $item (@B) { $seen{$item} = 1 }

# find only elements in @A and not in @B
foreach $item (@A) {
unless ($seen{$item}) {
# it's not in %seen, so add to @aonly
push(@aonly, $item);
}
}

4.8.2.2. More idiomatic version

my %seen; # lookup table
my @aonly; # answer

# build lookup table
@seen{@B} = ( );

foreach $item (@A) {
push(@aonly, $item) unless exists $seen{$item};
}

4.8.2.3. Loopless version

my @A = ...;
my @B = ...;

my %seen;
@seen {@A} = ( );
delete @seen {@B};

my @aonly = keys %seen;

4.8.3. Discussion

As with nearly any problem in Perl that asks whether a scalar is in one list or another, this one uses a hash. First, process @B so that the %seen hash records each element from @B by setting its value to 1. Then process @A one element at a time, checking whether that particular element had been in @B by consulting the %seen hash.

The given code retains duplicate elements in @A. This can be fixed easily by adding the elements of @A to %seen as they are processed:

foreach $item (@A) {
push(@aonly, $item) unless $seen{$item};
$seen{$item} = 1; # mark as seen
}

The first two solutions differ mainly in how they build the hash. The first iterates through @B. The second uses a hash slice to initialize the hash. A hash slice is easiest illustrated by this example:

$hash{"key1"} = 1;
$hash{"key2"} = 2;

which is equivalent to:

@hash{"key1", "key2"} = (1,2);

The list in the curly braces holds the keys; the list on the right holds the values. We initialize %seen in the first solution by looping over each element in @B and setting the appropriate value of %seen to 1. In the second, we simply say:

@seen{@B} = ( );

This uses items in @B as keys for %seen, setting each corresponding value to undef, because there are fewer values on the right than places to put them. This works out here because we check for existence of the key, not logical truth or defined ness of the value. If we needed true values, a slice could still shorten our code:

@seen{@B} = (1) x @B;

In the third solution, we make use of this property even further and avoid explicit loops altogether. (Not that avoiding loops should be construed as being particularly virtuous; we're just showing you that there's more than one way to do it.) The slice assignment makes any element that was in @A a key, and the slice deletion removes from the hash any keys that were elements of @B, leaving those that were only in @A.

A fairly common situation where this might arise is when you have two files and would like to know which lines from the second file either were or weren't in the first. Here's a simple solution based on this recipe:

open(OLD, $path1) || die "can't open $path1: $!";
@seen{ <OLD> } = ( );
open(NEW, $path2) || die "can't open $path2: $!";
while (<NEW>) {
print if exists $seen{$_};
}

This shows the lines in the second file that were already seen in the first one. Use unless instead of if to show the lines in the second file that were not in the first.

Imagine two files, the first containing the lines:

red
yellow
green
blue

and the second containing:

green
orange
purple
black
yellow

The output using if would be:

green
yellow

and the output using unless would be:

orange
purple
black

You could even do this from the command line; given a suitable cat(1) program, it's easy:

% perl -e '@s{`cat OLD`}=( ); exists $s{$_} && print for `cat NEW''
% perl -e '@s{`cat OLD`}=( ); exists $s{$_} || print for `cat NEW''

You'd find that you just emulated these calls to the Unix fgrep(1) program:

% fgrep -Ff OLD NEW
% fgrep -vFf OLD NEW

dmazzini
GSM System and Telecomm Consultant

 
#Still TIMT1WTDI

@clktckt = qw(IM0001 IM0002 IM0004 IM0005);
@probtckt = qw(IM0001 IM0006 IM0007 IM0008);

# "exist in @clktckt and not in @probtckt."

my %clockonly =();

@clockonly {@clktckt}= @clktckt;
delete @clockonly {@probtckt};

my @clockonly = keys %clockonly ;

# check it
map { print "$_\n";} @clockonly;
 
Thanks for all the replies folks. My bad here. dmazzini, I actually got my original code from the cookbook. The problem was I made my @problem array global and assumed it would be available to the subroutine (not). Once I passed it to the sub as a reference and de-referenced it in my sub, it started working, so we were all right.

[surprise]

[blush]

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top