Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Help to Elim Dups but keep required data

Status
Not open for further replies.

ppeel

Programmer
Dec 7, 2004
18
0
0
US
I'm new at Perl and am a little intimidated by some of the syntax. I'd really appreciated any help I can get.
I have a file with multiple lines that have the same "key" . The output can have only one line per "key", but the information from all lines with like keys have to be in that one output line.

Input:
key col2 col3
1111 aaaa
1111 aaaa
1111 aaaa ffff
2222 bbbb eeee
2222 bbbb
3333 cccc

Output should look like this:
1111 aaaa ffff
2222 bbbb eeee
3333 cccc

So I guess what I'm trying to do is eliminate duplicate keys, but make sure I output all the information available. The third column will either be blank or have something in it. If it has something in it it will always be the same for that key.
Anybody have any ideas on how to accomplish this in Perl?
 
#!/bin/perl

$line_num=0;

while(<DATA>) { # Read each line of data and consolidate duplicate keys
chop; # remove linefeed from $_

$record = $_;
@fields = split(/ /,$record);
$key = $fields[0]. " " . $fields[1]; # Key is assumed to be first two fields
$info = $fields[2];

$sort_order{++$line_num} = $key if $output{$key} eq ""; # Track the fist line number that the key is encountered at
$output{$key} = $record if length($record) > length($output{$key}); # Overwrite previous value
} # while(<DATA>) # if new value is longer


foreach (sort(keys(%sort_order))) { # Print results in hash, $output{}, in original order
$line_num = $_;
$key = $sort_order{$line_num};
print "$output{$key}\n";
} # foreach (sort(keys(%sort_order)))

__DATA__
1111 aaaa
1111 aaaa
1111 aaaa ffff
2222 bbbb eeee
2222 bbbb
3333 cccc
 
Wow that was fast! Thank you!
It almost worked. This is what I get for output:

1111 aaaa ffff
2222 bbbb
3333 cccc

The second record should have 'eeee' in the 3rd column.
So it's not quite working the way I need it. Any further suggestions?
 
1111 aaaa ffff
2222 bbbb eeee
3333 cccc


1111 aaaa ffff
2222 bbbb
3333 cccc
 
Sorry, my last reply was not intended to be sent. I was comparing what you want with what you claim you are getting.

Are you sure you are getting what you said? When I ran the program on my end, I got exactly what you want. It's as though the last line of the while loop is not working on your end.

Please review that line to make sure its the same one I provided.
 
Hmmmm. I did check the last line of the while loop and it looks exactly like yours. Don't know. I'm trying to follow what's going on, but don't quit understand it yet.
my last line:
$output{$key} = $record if length($record) > length($output{key});
 
I changed it around a little bit so I could better understand it. Here's what I did:

$rlen = length($record);
$olen = length($output{$key});
if ($rlen > $olen){$output{$key} = $record};


And now it works perfectly. Thanks for your help. I would never have gotten this far without it.

 
Both versions should return the same results. So I am not sure why you had problems.

I often use a "trailing" if condition as it intuitively removes the reader's focus from the condition and places it on the statement that is being performed (i.e the assignment statement). That was the intent of the tailing "if" (and "unless") when Perl was designed. I think it looks less clunky. But other than that it behaves the same way as the traditional "if
 
That's what I thought. I have no idea why it worked this way and not the way you did it. Oh well, I'm just happy I got it to work.

Thanks again dkyrtata.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top