Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Remove duplicates 4

Status
Not open for further replies.
Jun 16, 2005
52
US
How can I remove duplicates from a a text file that looks like this:


003758:003758:003760:003760:003922:003922:003673:003694:003694:003701:003702:003704:003713


etc...

It is a flat text file with each number being seperated by a colon. The output after removing the duplicates also needs to have each number seperated by a colon.

If anyone can help it would be greatly appreciated.
 
I like one-liners. This'll do it without preserving the ordering:
Code:
$data = join ':', keys %{ { map +( $_ => 1 ), split( ':', $data ) } };
. . . where $data initially contains a line of data in the same format as the one you've posted.
 
Ok that didn't work I did something wrong...

Heres what I had


my $TEXT = "c:/test.txt";
open (TEXT, "$TEXT") or die "Can't open $TEXT: $!\n";
$TEXT = join ':', keys %{ { map +( $_ => 1 ), split( ':', $TEXT ) } };

close(TEXT);

It just spit out a blank text file.
 
You didn't read from filehandle TEXT. When you run that one-liner on $TEXT, then $TEXT is still "c:/test.txt" and NOT the contents of the file.

Code:
my $TEXT = "c:/test.txt";
open (TEXT, "$TEXT") or die "Can't open $TEXT: $!\n";
$TEXT = <TEXT>; # reads one line

$TEXT = join ':', keys %{ { map +( $_ => 1 ), split( ':', $TEXT ) } };

close (TEXT);

When reading from a filehandle into a scalar, by default it reads the first line of the file. If the file contains multiple lines, read it into an array and do a foreach loop on each item. Also, chomp arrays.
 
Kirsle,

I read your solution and I like it! However, I have another question, though. How to retain the order?

I modified your code a bit:

Code:
my $text1 = "003758:003758:003760:003760:003922:003922:003673:003694:003694:003701:003702:003704:003713";
my $text2 = join ':', keys %{ { map +( $_ => 1 ), split( ':', $text1 ) } };

print "$text1\n";
print "$text2\n";

And here is the output:
Code:
003758:003758:003760:003760:003922:003922:003673:003694:003694:003701:003702:003704:003713
003922:003704:003694:003702:003701:003758:003760:003713:003673

Thanks.
 
Here's another way that will retain the order.
Code:
my $input = '003758:003758:003760:003760:003922:003922:003673:003694:003694:003701:003702:003704:003713';
my (%temp, $output);
$output = join(":", grep(!$temp{$_}++, split(":", $input)));

print $input, "\n";
print $output, "\n";
 
Nice code rharsh. Would you mind breaking down line 3 a bit more *** grep(!$temp{$_}++ ***? Reminds me of some japhs I've seen. I'm still trying grasp hashes, thanks-

Because a thing seems difficult for you, do not think it impossible for anyone to accomplish.
Marcus Aurelius
 
Firstly, the %temp hash is set up to keep track of which values have been seen before.

$temp{$_}++ uses the current field that's being processed (i.e. 003758, 003660, etc.) and increments the value associated with it in the hash. This returns the pre-increment hash value (i.e. 0 if we haven't seen it before, >0 if we have)

!$temp{$_}++ adds NOT to the above, so it will return a boolean true value if the value hasn't been seen before, and false otherwise.

grep() returns a list of all the values for which the expression passed to it evaluates to a true value. In this case, it will return a list of all the values that haven't already been encountered, thus removing all the duplicates.
 
Thanks ishnid for the explanation. :)

To read a file:
Code:
open FH, "< file.txt" or die;
my $input;
while ($input = <FH>) {
    my (%temp, $output);
    $output = join(":", grep(!$temp{$_}++, split(":", $input)));

    #print $input, "\n";
    print $output, "\n";
}
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top