Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

how to keep just none duplicate data 2

Status
Not open for further replies.

diera

Programmer
Mar 21, 2011
28
DE
Hi,

i found a lot of code to remove duplicate data. in my case, i would like to print only none duplicate data. any suggestion how to do it?

example data.
aaaaaaaaaaa
bbbbbbbbbbb
bbbbbbbbbbb
ccccccccccc
ddddddddddd
ddddddddddd

the desire output:
aaaaaaaaaaa
ccccccccccc

any help is much appreciated.

thank you.
 
Hi,

i just found the solution. Herewith the code by tadmc which is i've make some minor modification.

Code:
#!/usr/bin/perl

use strict;
use warnings;

my $file = '/perlscript/tweet/notbroadcast.txt';

open MYFILE, ">noneduplicate.txt";
select MYFILE;

local @ARGV = ($file);
my %lines;
while (<>) {
    $lines{$_}++;
}
print sort grep $lines{$_} == 1, keys %lines;
 
Hi all,

i have tried using this code to just print out none duplicate data.

Code:
#!/usr/bin/perl

use strict;
use warnings;

my $file = '/perlscript/tweet/notbroadcast.txt';

open MYFILE, ">noneduplicate.txt";
select MYFILE;

local @ARGV = ($file);
my %lines;
while (<>) {
    $lines{$_}++;
}
print sort grep $lines{$_} == 1, keys %lines;

Result
aaaaaaaaaaa
ccccccccccc
dddddddddddddddddddddd

actually its wrong. the real output is
aaaaaaaaaaa
ccccccccccc

where the error actually? thank you for your help.

 
Hi

Got it. The last line is missing the end of line mark. To handle them all equally either
Code:
[gray]# remove all \n then put them back when printing[/gray]
perl -ne '[highlight]chomp;[/highlight]$l{$_}++;END{print [highlight]join("\n",[/highlight]sort grep$l{$_}==1,keys%l[highlight]),"\n"[/highlight]}' /perlscript/tweet/notbroadcast.txt > noneduplicate.txt

[gray]# or just add them when missing[/gray]
perl -ne '[highlight]$_.="\n"if substr($_,-1)ne"\n";[/highlight]$l{$_}++;END{print sort grep$l{$_}==1,keys%l}' /perlscript/tweet/notbroadcast.txt > noneduplicate.txt


Feherke.
 
Hi

Franco, that is more efficient, but will work only for even multiplications.

( I mean will work for "[red]a[/red][small][gray]\n[/gray][/small][green]b[/green][small][gray]\n[/gray][/small][green]b[/green][small][gray]\n[/gray][/small][blue]c[/blue]" and "[red]a[/red][small][gray]\n[/gray][/small][green]b[/green][small][gray]\n[/gray][/small][green]b[/green][small][gray]\n[/gray][/small][green]b[/green][small][gray]\n[/gray][/small][green]b[/green][small][gray]\n[/gray][/small][blue]c[/blue]" but not for "[red]a[/red][small][gray]\n[/gray][/small][green]b[/green][small][gray]\n[/gray][/small][green]b[/green][small][gray]\n[/gray][/small][green]b[/green][small][gray]\n[/gray][/small][blue]c[/blue]" and "[red]a[/red][small][gray]\n[/gray][/small][green]b[/green][small][gray]\n[/gray][/small][green]b[/green][small][gray]\n[/gray][/small][green]b[/green][small][gray]\n[/gray][/small][green]b[/green][small][gray]\n[/gray][/small][green]b[/green][small][gray]\n[/gray][/small][blue]c[/blue]". )


Feherke.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top