Remove Duplicates Ignore Dates 1

mte0910 · Oct 27, 2008

I have this code that I am using to remove the duplicate entries from a text file. The problem is, I had to remove the date from the entries, otherwise it would make them unique. If I wanted to keep the date, how could I compare each line and remove the duplicates while ignoring the date?

[quote:]
open (IN, 'c:\tmp00.txt');
open (OUT,'>c:\tmp01.txt');
my %hTmp;
while (my $sLine = <IN>) {
print OUT $sLine unless ($hTmp{$sLine}++);
}
close (IN); close (OUT);
[/quote]

max1x · Oct 27, 2008

can u share ur data structure, it's hard to tell w/o that.

mte0910 · Oct 27, 2008

I should have posted this code instead...

open (IN, 'C:\tmp00.txt');
open (OUT,'>C:\tmp01.txt');
my %hTmp;
while (my $sLine = <IN>) {
print OUT $sLine unless ($hTmp{lc($sLine)}++);
}
close (IN); close (OUT);

mte0910 · Oct 27, 2008

Right now, the file looks like...

alphadata,10.10.10.110,server1
betadata,10.10.10.111,server1
charliedata,10.10.10.112,server1

I would like to use a file that looks like this...
9/9/2008,alphadata,10.10.10.110,server1
9/15/2008,betadata,10.10.10.111,server1
9/24/2008,charliedata,10.10.10.112,server1

Assuming there were two lines...
9/24/2008,charliedata,10.10.10.112,server1
10/18/2008,charliedata,10.10.10.112,server1

I would like to delete one of them, and preserve the "newest" for a result of...

10/18/2008,charliedata,10.10.10.112,server1

Kirsle · Oct 28, 2008

Code:

open (IN, "infile");
open (OUT, ">outfile");
my $dup = {};
while (<IN>) {
   my ($uniq) = $_ =~ /^[0-9\/]+\,(.+)$/;
   print OUT $_ unless ($dup->{$uniq}++);
}
close (OUT);
close (IN);

Code:

[kirsle@epsilon ~]$ cat <<EOF > infile
> 9/9/2008,alphadata,10.10.10.110,server1
> 9/15/2008,betadata,10.10.10.111,server1
> 9/24/2008,charliedata,10.10.10.112,server1
> 9/24/2008,charliedata,10.10.10.112,server1
> 10/18/2008,charliedata,10.10.10.112,server1
> EOF
[kirsle@epsilon ~]$ perl
open (IN, "infile");
open (OUT, ">outfile");
my $dup = {};
while (<IN>) {
   my ($uniq) = $_ =~ /^[0-9\/]+\,(.+)$/;
   print OUT $_ unless ($dup->{$uniq}++);
}
close (OUT);
close (IN);
__END__
[kirsle@epsilon ~]$ cat outfile
9/9/2008,alphadata,10.10.10.110,server1
9/15/2008,betadata,10.10.10.111,server1
9/24/2008,charliedata,10.10.10.112,server1

That should get ya a step further. If you wanted the date too:

Code:

my ($date,$uniq) = $_ =~ /^([0-9\/]+)\,(.+)$/;

and then do whatever with the $date.

-------------
Cuvou.com | My personal homepage

Code:

perl -e '$|=$i=1;print" oo\n<|>\n_|_";x:sleep$|;print"\b",$i++%2?"/":"_";goto x;'

Kirsle · Oct 28, 2008

*bored*

Code:

[kirsle@epsilon ~]$ perl
open (IN, "infile");
open (OUT, ">outfile");
my $dup = {};
while (<IN>) {
   my ($date,$uniq) = $_ =~ /^([0-9\/]+)\,(.+)$/;
   my @dates = split(/\//, $date, 3);
   my $int = join("",
      sprintf("%02d", $dates[2]),
      sprintf("%02d", $dates[0]),
      sprintf("%02d", $dates[1]),
   );
   unless ($dup->{$uniq} && $dup->{$uniq}->{int} > $int) {
      $dup->{$uniq}->{value} = $_;
      $dup->{$uniq}->{int} = $int;
   }
}
foreach my $line (sort keys %{$dup}) {
   print OUT $dup->{$line}->{value};
}
close (OUT);
close (IN);
__END__
[kirsle@epsilon ~]$ cat outfile
9/9/2008,alphadata,10.10.10.110,server1
9/15/2008,betadata,10.10.10.111,server1
10/18/2008,charliedata,10.10.10.112,server1

Code explanation:

It reads the lines, separates the date from the unique content, turns the date into an integer (i.e. 9/15/2008 becomes 20080915) so that it can easily compare dates by numbers, and then it starts sorting them away into $dup->{$uniq} by the unique half of the line (and stores the full value of the line as {value} and the integer it got as {int}

And on subsequent loops about the same unique part of a line, it compares of the integer it got from this line is bigger than the one it got the last time. If so it sets the {value} and {int} to the values of this new line.

Since it can't predict whether it will get a newer version of the current line, it won't be able to output your data in any particular order (if it printed every time it saw a newer date on a line, you'd still have duplicates in the final file). To that end I just had it sort the results alphabetically by unique parts in them.

Questions?

-------------
Cuvou.com | My personal homepage

Code:

perl -e '$|=$i=1;print" oo\n<|>\n_|_";x:sleep$|;print"\b",$i++%2?"/":"_";goto x;'

KevinADC · Oct 28, 2008

if the dates in the file are always in ascending order, the highest date will automatically be retained. If the dates are in no particular order, then you will have to use something like kirsle has posted.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

mte0910 · Oct 29, 2008

Worked like a charm!
Now, I just have to figure out why

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Remove Duplicates Ignore Dates 1

mte0910

Programmer

max1x

Programmer

mte0910

Programmer

mte0910

Programmer

Kirsle

Programmer

Kirsle

Programmer

KevinADC

Technical User

mte0910

Programmer

Similar threads

Part and Inventory Search

Sponsor