Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Sort a file

Status
Not open for further replies.

szzxy

Technical User
Aug 14, 2010
8
CA

Hello, I have the following input file.
gene1 rs1
gene2 rs2
gene1 rs3
gene1 rs4

And I want to write a program to make it looks like:

gene1 rs1, rs3, rs4
gene2 rs2

that is, put all the rs with the same gene number into a row.
I parsed the file and called my gene number column value $gene and I wrote the following script. Please disregard context and syntax errors for now.
...
foreach $gene(@genes){
my $temp="SLCO4A1";
if ($gene eq $temp){

push (@contents,$gene $rs);
$temp=$gene;
}

}
foreach my $line(@contents){

print $line;
}


Thanks a lot,
szzxy
 
Well, may be a bit repetition here. I have the following input file.
gene1 rs1
gene2 rs2
gene1 rs3
gene1 rs4

And I want to write a program to make it look like:

gene1 rs1, rs3, rs4
gene2 rs2


Could anyone please help with this.
Thx,
szzxy
 
you could do this with a hash of arrays or a hash of hashes. If duplicate values in the second column don't matter to you, I'd just use the hash of arrays method.
It can be tricky getting the values out of the data structure. See perldoc perldsc for examples.

The hash of hashes method will increment a counter for that value every time a duplicate in the second column is encountered. (like the last line in the data I added)


Code:
use strict;
use Data::Dumper;

my %genes;
while (chomp( my $record = <DATA>) ) {
    my ( $gene, $rs ) = split( /\s+/, $record );
#    $genes{ $gene }{ $rs }++;        # hash of hashes
    push  @{$genes{ $gene }}, $rs;  # hash of arrays

}

print Dumper( \%genes ); # dump the whole data structure

# print all values for gene1
print "$_\n" for @{ $genes{ 'gene1' } };

__DATA__
gene1    rs1
gene2    rs2
gene1    rs3
gene1    rs4
gene1    rs1
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top