Navigation

More options

Style variation

Close Menu

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Congratulations derfloh on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How do I remove duplicate lines from a file?

Working withText Files

How do I remove duplicate lines from a file?

by KevinADC Posted Feb 12, 2007 (Edited Feb 15, 2007)

----------------------------
[small]ignore this section:
code
perl
print
processing[/small]
----------------------------

Problem :

You have some sort of text file with many duplicate lines and you want to remove all the duplicates but also keep the original order of the lines.

Solution :

Use perls in-place editor and a hash.

Code:

[ol]
[li][gray]#!/usr/bin/perl[/gray][/li]
[li][/li]
[li][link http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/link] [green]strict[/green][red];[/red][/li]
[li][black][b]use[/b][/black] [green]warnings[/green][red];[/red][/li]
[li][/li]
[li][link http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/link] [blue]$file[/blue] = [red]'[/red][purple]/path/to/file.txt[/purple][red]'[/red][red];[/red][/li]
[li][black][b]my[/b][/black] [blue]%seen[/blue] = [red]([/red][red])[/red][red];[/red][/li]
[li][red]{[/red][/li]
[li]   [link http://perldoc.perl.org/functions/local.html][black][b]local[/b][/black][/link] [blue]@ARGV[/blue] = [red]([/red][blue]$file[/blue][red])[/red][red];[/red][/li]
[li]   [black][b]local[/b][/black] [blue]$^I[/blue] = [red]'[/red][purple].bac[/purple][red]'[/red][red];[/red][/li]
[li]   [olive][b]while[/b][/olive][red]([/red]<>[red])[/red][red]{[/red][/li]
[li]      [blue]$seen[/blue][red]{[/red][blue]$_[/blue][red]}[/red]++[red];[/red][/li]
[li]      [olive][b]next[/b][/olive] [olive][b]if[/b][/olive] [blue]$seen[/blue][red]{[/red][blue]$_[/blue][red]}[/red] > [fuchsia]1[/fuchsia][red];[/red][/li]
[li]      [link http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/link][red];[/red][/li]
[li]   [red]}[/red][/li]
[li][red]}[/red][/li]
[li][black][b]print[/b][/black] [red]"[/red][purple]finished processing file.[/purple][red]"[/red][red];[/red][/li]
[/ol]

[tt]------------------------------------------------------------
Pragmas (perl 5.8.8) used :
[ul]
[li][link http://perldoc.perl.org/strict.html]strict[/link] - Perl pragma to restrict unsafe constructs[/li]
[li][link http://perldoc.perl.org/warnings.html]warnings[/link] - Perl pragma to control optional warnings[/li]
[/ul]
[/tt]

Discussion :

By duplicate lines, I mean just that, exactly the same, including white space and other characters. If extra white spaces were not to be considered you could collapse them into one white space after line number 11 and before line number 12.

Code:

tr/ //s;

but if you wanted to keep the original line with all the white spaces as they were you would have to make a temporary copy of it to print back into the file.

Code without markup :

Code:

#!/usr/bin/perl

use strict;
use warnings;

my $file = '/path/to/file.txt';
my %seen = ();
{
   local @ARGV = ($file);
   local $^I = '.bac';
   while(<>){
      $seen{$_}++;
      next if $seen{$_} > 1;
      print;
   }
}
print "finished processing file.";

Please Note: 1 is Bad, 10 is Good :-)

Part and Inventory Search

This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.

Accept Learn more…

Back

Top