Remove duplicates 4

AquaTeenFryMan · Aug 2, 2006

How can I remove duplicates from a a text file that looks like this:

003758:003758:003760:003760:003922:003922:003673:003694:003694:003701:003702:003704:003713

etc...

It is a flat text file with each number being seperated by a colon. The output after removing the duplicates also needs to have each number seperated by a colon.

If anyone can help it would be greatly appreciated.

ishnid · Aug 2, 2006

I like one-liners. This'll do it without preserving the ordering:

Code:

$data = join ':', keys %{ { map +( $_ => 1 ), split( ':', $data ) } };

. . . where $data initially contains a line of data in the same format as the one you've posted.

AquaTeenFryMan · Aug 2, 2006

Ok that didn't work I did something wrong...

Heres what I had

my $TEXT = "c:/test.txt";
open (TEXT, "$TEXT") or die "Can't open $TEXT: $!\n";
$TEXT = join ':', keys %{ { map +( $_ => 1 ), split( ':', $TEXT ) } };

close(TEXT);

It just spit out a blank text file.

Kirsle · Aug 2, 2006

You didn't read from filehandle TEXT. When you run that one-liner on $TEXT, then $TEXT is still "c:/test.txt" and NOT the contents of the file.

Code:

my $TEXT = "c:/test.txt";
open (TEXT, "$TEXT") or die "Can't open $TEXT: $!\n";
$TEXT = <TEXT>; # reads one line

$TEXT = join ':', keys %{ { map +( $_ => 1 ), split( ':', $TEXT ) } };

close (TEXT);

When reading from a filehandle into a scalar, by default it reads the first line of the file. If the file contains multiple lines, read it into an array and do a foreach loop on each item. Also, chomp arrays.

lcs01 · Aug 2, 2006

Kirsle,

I read your solution and I like it! However, I have another question, though. How to retain the order?

I modified your code a bit:

Code:

my $text1 = "003758:003758:003760:003760:003922:003922:003673:003694:003694:003701:003702:003704:003713";
my $text2 = join ':', keys %{ { map +( $_ => 1 ), split( ':', $text1 ) } };

print "$text1\n";
print "$text2\n";

And here is the output:

Code:

003758:003758:003760:003760:003922:003922:003673:003694:003694:003701:003702:003704:003713
003922:003704:003694:003702:003701:003758:003760:003713:003673

Thanks.

rharsh · Aug 2, 2006

Here's another way that will retain the order.

Code:

my $input = '003758:003758:003760:003760:003922:003922:003673:003694:003694:003701:003702:003704:003713';
my (%temp, $output);
$output = join(":", grep(!$temp{$_}++, split(":", $input)));

print $input, "\n";
print $output, "\n";

AquaTeenFryMan · Aug 2, 2006

Wow you guys are great. These forums never let me down.

jriggs420 · Aug 2, 2006

Nice code rharsh. Would you mind breaking down line 3 a bit more *** grep(!$temp{$_}++ ***? Reminds me of some japhs I've seen. I'm still trying grasp hashes, thanks-

Because a thing seems difficult for you, do not think it impossible for anyone to accomplish.
Marcus Aurelius

ishnid · Aug 3, 2006

Firstly, the %temp hash is set up to keep track of which values have been seen before.

$temp{$_}++ uses the current field that's being processed (i.e. 003758, 003660, etc.) and increments the value associated with it in the hash. This returns the pre-increment hash value (i.e. 0 if we haven't seen it before, >0 if we have)

!$temp{$_}++ adds NOT to the above, so it will return a boolean true value if the value hasn't been seen before, and false otherwise.

grep() returns a list of all the values for which the expression passed to it evaluates to a true value. In this case, it will return a list of all the values that haven't already been encountered, thus removing all the duplicates.

AquaTeenFryMan · Aug 3, 2006

How can I get it to read from a file?

rharsh · Aug 3, 2006

Thanks ishnid for the explanation.

To read a file:

Code:

open FH, "< file.txt" or die;
my $input;
while ($input = <FH>) {
    my (%temp, $output);
    $output = join(":", grep(!$temp{$_}++, split(":", $input)));

    #print $input, "\n";
    print $output, "\n";
}

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Remove duplicates 4

AquaTeenFryMan

MIS

ishnid

Programmer

AquaTeenFryMan

MIS

Kirsle

Programmer

lcs01

Programmer

rharsh

Technical User

AquaTeenFryMan

MIS

jriggs420

Programmer

ishnid

Programmer

AquaTeenFryMan

MIS

rharsh

Technical User

Similar threads

Part and Inventory Search

Sponsor