Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

parsing into a hash sans bioperl 1

Status
Not open for further replies.

torstens

Technical User
Oct 9, 2006
26
US
I want to parse a FASTA-like file into a hash (without using bioperl). The file I'm working with is similar to...

>TTHERM_01213980
ATGGAGTAGATTTAATAAGACTAATAAGGATTGATTTCATGGTACTGTTATAATGCATAA

>TTHERM_00697570
ATGAATAAATATACTCTAATTACTTTAGGAGTTTGTATGCTTATAGTTAATGGGTTTTTG
AATAAGCATACCTTTTAATTATCTAACCACTAAACTGGCTTTGATTTATCTCTGTGTGCC

but the sequences are very long, and there are 40+ of them. I want the key to be the name (i.e. TTHERM_01213980) and the content to be the sequence, all of them in one hash.
I've come up with some ways of trying them, but I won't even try writing them because they're long, and don't work.
I appreciate any help I can get.

Thanks
 
Is the data all on one line or split into multiple lines like the second example above?
 
Code:
open FILE, 'data.txt' or die "Can't open data.txt: $!";

my %hash;
my $key = '';
while (<FILE>) {
	chomp;
	next if /^\s*$/;
	
	if (/^>\s*(\w+)/) {
		die "Duplicate Key Found: $1\n"  if exists $hash{$1};
		$key = $1;
		$hash{$key} = [];
		next;
	}
	
	if ($key eq '') {
		die "Data Found before a key\n";
	}
	
	push @{$hash{$key}}, $_;
}

close FILE;

Enjoy,
- Miller
 
I wish I could give you more than one star for that. Thanks Miller H
 
How would I concatenate the values of this hash so that rather than an array I would have a single scalar sequence?
 
Change the two lines doing array operations to string operations:

Code:
$hash{$key} = [];

to

Code:
$hash{$key} = '';

and

Code:
push @{$hash{$key}}, $_;

to

Code:
$hash{$key}} .= $_;

- Miller
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top