Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How do I remove duplicate lines from a file?

Working withText Files

How do I remove duplicate lines from a file?

by  KevinADC  Posted    (Edited  )
----------------------------
[small]ignore this section:
code
perl
print
processing[/small]
----------------------------



Problem :

You have some sort of text file with many duplicate lines and you want to remove all the duplicates but also keep the original order of the lines.

Solution :

Use perls in-place editor and a hash.

Code:
[ol]
[li][gray]#!/usr/bin/perl[/gray][/li]
[li][/li]
[li][link http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/link] [green]strict[/green][red];[/red][/li]
[li][black][b]use[/b][/black] [green]warnings[/green][red];[/red][/li]
[li][/li]
[li][link http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/link] [blue]$file[/blue] = [red]'[/red][purple]/path/to/file.txt[/purple][red]'[/red][red];[/red][/li]
[li][black][b]my[/b][/black] [blue]%seen[/blue] = [red]([/red][red])[/red][red];[/red][/li]
[li][red]{[/red][/li]
[li]   [link http://perldoc.perl.org/functions/local.html][black][b]local[/b][/black][/link] [blue]@ARGV[/blue] = [red]([/red][blue]$file[/blue][red])[/red][red];[/red][/li]
[li]   [black][b]local[/b][/black] [blue]$^I[/blue] = [red]'[/red][purple].bac[/purple][red]'[/red][red];[/red][/li]
[li]   [olive][b]while[/b][/olive][red]([/red]<>[red])[/red][red]{[/red][/li]
[li]      [blue]$seen[/blue][red]{[/red][blue]$_[/blue][red]}[/red]++[red];[/red][/li]
[li]      [olive][b]next[/b][/olive] [olive][b]if[/b][/olive] [blue]$seen[/blue][red]{[/red][blue]$_[/blue][red]}[/red] > [fuchsia]1[/fuchsia][red];[/red][/li]
[li]      [link http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/link][red];[/red][/li]
[li]   [red]}[/red][/li]
[li][red]}[/red][/li]
[li][black][b]print[/b][/black] [red]"[/red][purple]finished processing file.[/purple][red]"[/red][red];[/red][/li]
[/ol]
[tt]------------------------------------------------------------
Pragmas (perl 5.8.8) used :
[ul]
[li][link http://perldoc.perl.org/strict.html]strict[/link] - Perl pragma to restrict unsafe constructs[/li]
[li][link http://perldoc.perl.org/warnings.html]warnings[/link] - Perl pragma to control optional warnings[/li]
[/ul]
[/tt]

Discussion :

By duplicate lines, I mean just that, exactly the same, including white space and other characters. If extra white spaces were not to be considered you could collapse them into one white space after line number 11 and before line number 12.

Code:
tr/ //s;

but if you wanted to keep the original line with all the white spaces as they were you would have to make a temporary copy of it to print back into the file.

Code without markup :

Code:
#!/usr/bin/perl

use strict;
use warnings;

my $file = '/path/to/file.txt';
my %seen = ();
{
   local @ARGV = ($file);
   local $^I = '.bac';
   while(<>){
      $seen{$_}++;
      next if $seen{$_} > 1;
      print;
   }
}
print "finished processing file.";

Register to rate this FAQ  : BAD 1 2 3 4 5 6 7 8 9 10 GOOD
Please Note: 1 is Bad, 10 is Good :-)

Part and Inventory Search

Back
Top