Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Problems with my hash 1

Status
Not open for further replies.

torstens

Technical User
Oct 9, 2006
26
US
Hello all,

I'm creating a program that takes a DNA sequence, goes through it codon by codon, counting the number of codons for each amino acid.

Here's the code (It's long, but simple and I'd really appreciate any help I can get) [the output is after it, essentially it isn't counting]

Code:






##########################################
# Load Sequence
##########################################

my $DNA;

$DNA = "ATGACCCCAATACGCAAAACTAACCCCCTAATAAAATTAATTAACCACTCATTCATCGACCTCCCCACCCCATCCAACATCTCCGCATGATGAAACTTCGGCTCACTCCTTGGCGCCTGCCTGATCCTCCAAATCACCACAGGACTATTCCTAGCCATGCACTACTCACCAGACGCCTCAACCGCCTTTTCATCAATCGCCCACATCACTCGAGACGTAAATTATGGCTGAATCATCCGCTACCTTCACGCCAATGGCGCCTCAATATTCTTTATCTGCCTCTTCCTACACATCGGGCGAGGCCTATATTACGGATCATTTCTCTACTCAGAAACCTGAAACATCGGCATTATCCTCCTGCTTGCAACTATAGCAACAGCCTTCATAGGCTATGTCCTCCCGTGAGGCCAAATATCATTCTGAGGGGCCACAGTAATTACAAACTTACTATCCGCCATCCCATACATTGGGACAGACCTAGTTCAATGAATCTGAGGAGGCTACTCAGTAGACAGTCCCACCCTCACACGATTCTTTACCTTTCACTTCATCTTGCCCTTCATTATTGCAGCCCTAGCAACACTCCACCTCCTATTCTTGCACGAAACGGGATCAAACAACCCCCTAGGAATCACCTCCCATTCCGATAAAATCACCTTCCACCCTTACTACACAATCAAAGACGCCCTCGGCTTACTTCTCTTCCTTCTCTCCTTAATGACATTAACACTATTCTCACCAGACCTCCTAGGCGACCCAGACAATTATACCCTAGCCAACCCCTTAAACACCCCTCCCCACATCAAGCCCGAATGATATTTCCTATTCGCCTACACAATTCTCCGATCCGTCCCTAACAAACTAGGAGGCGTCCTTGCCCTATTACTATCCATCCTCATCCTAGCAATAATCCCCATCCTCCATATATCCAAACAACAAAGCATAATATTTCGCCCACTAAGCCAATCACTTTATTGACTCCTAGCCGCAGACCTCCTCATTCTAACCTGAATCGGAGGACAACCAGTAAGCTACCCTTTTACCATCATTGGACAAGTAGCATCCGTACTATACTTCACAACAATCCTAATCCTAATACCAACTATCTCCCTAATTGAAAACAAAATACTCAAATAATAAT";



##########################################
# Populate Data Set
##########################################

my %MasterHash = ();
my $AA_Count = 0;
my $CD_Occurences = 0;


for(my $i=0; $i < (length($DNA) - 2) ; $i += 3) {

my $Codon = substr($DNA,$i,3);

my $AA = codon2aa( $Codon );

print "\nAmino Acid: $AA";
print "\nCodon: $Codon\n";


### Setting up AA count

if ($MasterHash{$AA}) { #tests whether AA key has already been created

$MasterHash{$AA}{Count}++; #if so, add to AA count

} else { #if not...

$MasterHash{$AA}{Symbol} = $AA; #set up symbol
$MasterHash{$AA}{Count} = 1; #set up AA count

}

print "Amino Acid Count: $MasterHash{$AA}{Count}\n";


### Settin up Codon Count

if ($MasterHash{$AA}{Codons}{$Codon}) { #tests whether codon key has already been created

$MasterHash{$AA}{Codons}{$Codon}{Count}++; #if so, add to codon count

} else { #if not...

$MasterHash{$AA}{Codons}{$Codon}{Count} = 1; #set up codon count

}

print "Codon Count: $MasterHash{$AA}{Codons}{$Codon}{Count}\n";



%MasterHash = (

$AA => {
Symbol => $AA,
Count => $AA_Count,
Codons => {
$Codon => {
Count => $CD_Count,
},
},
},
);
}










##########################################
# Print Data Set
##########################################

print "\nTest\n";
$TestHash{'Run'};
print "\n";
print $MasterHash{M}{Symbol};
print "\n";
print $MasterHash{M}{Count};
print "\n";
print $MasterHash{M}{Codons};
print "\nEnd Test\n\n";


while( my ($k) = each (%MasterHash) ) {
print "Amino Acid: $k\n";
#print "Amino Acid: $%AVar\n";

}





##########################################
# Subroutines
##########################################






sub codon2aa {
my($codon) = @_;

$codon = uc $codon;

my(%genetic_code) = (

'TCA' => 'S', # Serine
'TCC' => 'S', # Serine
'TCG' => 'S', # Serine
'TCT' => 'S', # Serine
'TTC' => 'F', # Phenylalanine
'TTT' => 'F', # Phenylalanine
'TTA' => 'L', # Leucine
'TTG' => 'L', # Leucine
'TAC' => 'Y', # Tyrosine
'TAT' => 'Y', # Tyrosine
'TAA' => '_', # Stop
'TAG' => '_', # Stop
'TGC' => 'C', # Cysteine
'TGT' => 'C', # Cysteine
'TGA' => '_', # Stop
'TGG' => 'W', # Tryptophan
'CTA' => 'L', # Leucine
'CTC' => 'L', # Leucine
'CTG' => 'L', # Leucine
'CTT' => 'L', # Leucine
'CCA' => 'P', # Proline
'CCC' => 'P', # Proline
'CCG' => 'P', # Proline
'CCT' => 'P', # Proline
'CAC' => 'H', # Histidine
'CAT' => 'H', # Histidine
'CAA' => 'Q', # Glutamine
'CAG' => 'Q', # Glutamine
'CGA' => 'R', # Arginine
'CGC' => 'R', # Arginine
'CGG' => 'R', # Arginine
'CGT' => 'R', # Arginine
'ATA' => 'I', # Isoleucine
'ATC' => 'I', # Isoleucine
'ATT' => 'I', # Isoleucine
'ATG' => 'M', # Methionine
'ACA' => 'T', # Threonine
'ACC' => 'T', # Threonine
'ACG' => 'T', # Threonine
'ACT' => 'T', # Threonine
'AAC' => 'N', # Asparagine
'AAT' => 'N', # Asparagine
'AAA' => 'K', # Lysine
'AAG' => 'K', # Lysine
'AGC' => 'S', # Serine
'AGT' => 'S', # Serine
'AGA' => 'R', # Arginine
'AGG' => 'R', # Arginine
'GTA' => 'V', # Valine
'GTC' => 'V', # Valine
'GTG' => 'V', # Valine
'GTT' => 'V', # Valine
'GCA' => 'A', # Alanine
'GCC' => 'A', # Alanine
'GCG' => 'A', # Alanine
'GCT' => 'A', # Alanine
'GAC' => 'D', # Aspartic Acid
'GAT' => 'D', # Aspartic Acid
'GAA' => 'E', # Glutamic Acid
'GAG' => 'E', # Glutamic Acid
'GGA' => 'G', # Glycine
'GGC' => 'G', # Glycine
'GGG' => 'G', # Glycine
'GGT' => 'G', # Glycine
);

if(exists $genetic_code{$codon}) {
return $genetic_code{$codon};
}else{

print STDERR "Bad codon \"$codon\"!!\n";
#exit;
}
}


OUTPUT:

...
Amino Acid: _
Codon: TGA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: I
Codon: ATC
Amino Acid Count: 1
Codon Count: 1

Amino Acid: G
Codon: GGA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: G
Codon: GGA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: Q
Codon: CAA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: P
Codon: CCA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: V
Codon: GTA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: S
Codon: AGC
Amino Acid Count: 1
Codon Count: 1

Amino Acid: Y
Codon: TAC
Amino Acid Count: 1
Codon Count: 1

Amino Acid: P
Codon: CCT
Amino Acid Count: 1
Codon Count: 1

Amino Acid: F
Codon: TTT
Amino Acid Count: 1
Codon Count: 1

Amino Acid: T
Codon: ACC
Amino Acid Count: 1
Codon Count: 1

Amino Acid: I
Codon: ATC
Amino Acid Count: 1
Codon Count: 1

Amino Acid: I
Codon: ATT
Amino Acid Count: 1
Codon Count: 1

Amino Acid: G
Codon: GGA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: Q
Codon: CAA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: V
Codon: GTA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: A
Codon: GCA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: S
Codon: TCC
Amino Acid Count: 1
Codon Count: 1

Amino Acid: V
Codon: GTA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: L
Codon: CTA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: Y
Codon: TAC
Amino Acid Count: 1
Codon Count: 1

Amino Acid: F
Codon: TTC
Amino Acid Count: 1
Codon Count: 1

Amino Acid: T
Codon: ACA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: T
Codon: ACA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: I
Codon: ATC
Amino Acid Count: 1
Codon Count: 1

Amino Acid: L
Codon: CTA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: I
Codon: ATC
Amino Acid Count: 1
Codon Count: 1

Amino Acid: L
Codon: CTA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: I
Codon: ATA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: P
Codon: CCA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: T
Codon: ACT
Amino Acid Count: 1
Codon Count: 1

Amino Acid: I
Codon: ATC
Amino Acid Count: 1
Codon Count: 1

Amino Acid: S
Codon: TCC
Amino Acid Count: 1
Codon Count: 1

Amino Acid: L
Codon: CTA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: I
Codon: ATT
Amino Acid Count: 1
Codon Count: 1

Amino Acid: E
Codon: GAA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: N
Codon: AAC
Amino Acid Count: 1
Codon Count: 1

Amino Acid: K
Codon: AAA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: I
Codon: ATA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: L
Codon: CTC
Amino Acid Count: 1
Codon Count: 1

Amino Acid: K
Codon: AAA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: _
Codon: TAA
Amino Acid Count: 1
Codon Count: 1

Amino Acid: _
Codon: TAA
Amino Acid Count: 1
Codon Count: 1

Test




End Test

Amino Acid: M
Amino Acid: _
 
is this school/course work?

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Your problem is most likely this:

Code:
    %MasterHash = (
    
        $AA => {        
                Symbol         => $AA,
                Count         => $AA_Count,
                Codons        => {
                                    $Codon => {
                                                Count    => $CD_Count,                                    
                                                },
                                },
                },    
    );

inside of your for() loop. When you call this bit of code,
you're telling Perl: "set %MasterHash to equal... this:",
so that any keys that the other loops added or modified
in %MasterHash get totally wiped out, and %MasterHash
is set to equal this on every loop.

-------------
Cuvou.com | The NEW Kirsle.net
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top