Hi! I have a txt file which holds 100 sequences and the
width for each sequence is 8 nucleotide long.
Like:
AGATAGCA
ACACAGTA
TCCGTGCT
ATTGGCTA
GACTTGAA
........
I want to calculate the frequency of nucleotides A,T,C,G
for each column NOT each row and print them out.
My code is here but it doesn't work. Would anybody give
me some suggestions? Thank you very much!
Alex
Code
#!/usr/bin/perl -w
# Determining frequency of nucleotides
my (@short,$x,$position,$base);
open (SHORT, "< outfile8.txt");
chomp (@short = <SHORT>);
close SHORT;
for ( $position = 0 ; $position < 7 ; ++$position ){
for ($x=0; $x<=$#short; $x++){
$count_of_A = 0; # Initialize the counts.
$count_of_C = 0;
$count_of_G = 0;
$count_of_T = 0;
$base = substr($short[$x], $position, 1);
if ( $base eq 'A' ) {
++$count_of_A;
} elsif ( $base eq 'C' ) {
++$count_of_C;
} elsif ( $base eq 'G' ) {
++$count_of_G;
} elsif ( $base eq 'T' ) {
++$count_of_T;
}
}
print "A = $count_of_A\n\"; # print the results
print "C = $count_of_C\n\";
print "G = $count_of_G\n\";
print "T = $count_of_T\n\";
}
# exit the program
exit;
outfile8.txt holds sequences:
ACGTGACC
TGCAGATT
TAGTTTGA
TTTTCAAA
GACTTCAG
CCAGTTTA
TTACTTTG
TGCCGCGT
TCACGGCG
AGTCGTCA
CCCTAATT
CATTAGCG
TGGTTAAT
TGGGTTAC
GACTACCC
TAGCGCTG
TACTGTTC
AGTGAGCC
CCTTTAAC
GAGCGTCG
GTACAGTT
ATTCTCCA
GACACGCG
TACTAGAA
GTTCATCA
TATAGTCC
CTAATTTT
TACCCATG
CATGTATG
GAACTACG
CGGTTAGT
CACATGCG
TGTGACCT
TTCATCTA
GCCCTGGG
TTGGGAGC
CTACTACC
AGAACCAC
AGTCGACT
TTGATGCT
CCCTCCTA
AGCAAGTT
AGAGCGAA
TACGACAA
CGATTAGC
TACTTAGA
ACCCACCG
CTGAGTTC
TGTCATAA
CTATTGCC
ATCTGTAC
TAACTCAG
CTTGTACC
CATCATTG
GCTTGGAG
TATGAAGC
CAGACTAT
GAATCGTG
AACGATCC
ATTTGTCC
AGGGCCGG
GATATGTC
GATTCTGG
TCCAATCC
CGCGGTAA
TAGGGTCT
TCGGTATT
TCGTTGGT
TGACTGGG
GGTTATTA
ACCATGCG
TTTCAGGC
TACTGAGA
GGTTATAC
CACGCAGT
GCGAACCA
ACCCAGTA
CCGGGCTG
TGGTCTTC
AGATAACA
AAATACTA
TACGGAAG
GGAATAAA
TCTTGATA
TGGAAGGA
AGATGACG
AAACTCAA
TGAAGCGC
CGGTGCGG
TGTCGTCT
AATACGGA
TAAGTGCA
GCTGGTAT
GATCAGAC
CGTTTGGG
AGATAGCA
ACACAGTA
TCCGTGCT
ATTGGCTA
GACTTGAA
width for each sequence is 8 nucleotide long.
Like:
AGATAGCA
ACACAGTA
TCCGTGCT
ATTGGCTA
GACTTGAA
........
I want to calculate the frequency of nucleotides A,T,C,G
for each column NOT each row and print them out.
My code is here but it doesn't work. Would anybody give
me some suggestions? Thank you very much!
Alex
Code
#!/usr/bin/perl -w
# Determining frequency of nucleotides
my (@short,$x,$position,$base);
open (SHORT, "< outfile8.txt");
chomp (@short = <SHORT>);
close SHORT;
for ( $position = 0 ; $position < 7 ; ++$position ){
for ($x=0; $x<=$#short; $x++){
$count_of_A = 0; # Initialize the counts.
$count_of_C = 0;
$count_of_G = 0;
$count_of_T = 0;
$base = substr($short[$x], $position, 1);
if ( $base eq 'A' ) {
++$count_of_A;
} elsif ( $base eq 'C' ) {
++$count_of_C;
} elsif ( $base eq 'G' ) {
++$count_of_G;
} elsif ( $base eq 'T' ) {
++$count_of_T;
}
}
print "A = $count_of_A\n\"; # print the results
print "C = $count_of_C\n\";
print "G = $count_of_G\n\";
print "T = $count_of_T\n\";
}
# exit the program
exit;
outfile8.txt holds sequences:
ACGTGACC
TGCAGATT
TAGTTTGA
TTTTCAAA
GACTTCAG
CCAGTTTA
TTACTTTG
TGCCGCGT
TCACGGCG
AGTCGTCA
CCCTAATT
CATTAGCG
TGGTTAAT
TGGGTTAC
GACTACCC
TAGCGCTG
TACTGTTC
AGTGAGCC
CCTTTAAC
GAGCGTCG
GTACAGTT
ATTCTCCA
GACACGCG
TACTAGAA
GTTCATCA
TATAGTCC
CTAATTTT
TACCCATG
CATGTATG
GAACTACG
CGGTTAGT
CACATGCG
TGTGACCT
TTCATCTA
GCCCTGGG
TTGGGAGC
CTACTACC
AGAACCAC
AGTCGACT
TTGATGCT
CCCTCCTA
AGCAAGTT
AGAGCGAA
TACGACAA
CGATTAGC
TACTTAGA
ACCCACCG
CTGAGTTC
TGTCATAA
CTATTGCC
ATCTGTAC
TAACTCAG
CTTGTACC
CATCATTG
GCTTGGAG
TATGAAGC
CAGACTAT
GAATCGTG
AACGATCC
ATTTGTCC
AGGGCCGG
GATATGTC
GATTCTGG
TCCAATCC
CGCGGTAA
TAGGGTCT
TCGGTATT
TCGTTGGT
TGACTGGG
GGTTATTA
ACCATGCG
TTTCAGGC
TACTGAGA
GGTTATAC
CACGCAGT
GCGAACCA
ACCCAGTA
CCGGGCTG
TGGTCTTC
AGATAACA
AAATACTA
TACGGAAG
GGAATAAA
TCTTGATA
TGGAAGGA
AGATGACG
AAACTCAA
TGAAGCGC
CGGTGCGG
TGTCGTCT
AATACGGA
TAAGTGCA
GCTGGTAT
GATCAGAC
CGTTTGGG
AGATAGCA
ACACAGTA
TCCGTGCT
ATTGGCTA
GACTTGAA