Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

A,C,T & G - counting ocurrences in columns

Status
Not open for further replies.

duncdude

Programmer
Jul 28, 2003
1,979
GB
Hi

Have just solved this problem in the Perl forum - but i think it would be ideal to crack this problem with an awk script

Basically need to count the number of times A, C, T & G appear in each of the columns

sample data:-
ACGTGACC
TGCAGATT
TAGTTTGA
TTTTCAAA
GACTTCAG
CCAGTTTA
TTACTTTG
TGCCGCGT
TCACGGCG
AGTCGTCA
CCCTAATT
CATTAGCG
TGGTTAAT
TGGGTTAC
GACTACCC
TAGCGCTG
TACTGTTC
AGTGAGCC

Can anyone help?
 
Hi PHV

Something like this if possible:-

A: conclusion
6 22
1 34
0 23
2 23
5 25
3 19
4 27

C: conclusion
0 21
6 32
1 20
2 28
5 20
3 25
4 13

T: conclusion
1 17
6 26
0 37
2 29
5 26
3 34
4 33

G: conclusion
0 19
6 20
1 29
2 20
5 29
3 22
4 27


Kind Regards
Duncan
 
Due to the nature of hashes in Perl the column numbers are all over the place - but this is the general idea

Just seems a perfect awk problem to me!?


Kind Regards
Duncan
 
I don't understand the result vs the sample data you posted.
 
sorry - i am being a little vague

this is the full dataset i used:-

ACGTGACC
TGCAGATT
TAGTTTGA
TTTTCAAA
GACTTCAG
CCAGTTTA
TTACTTTG
TGCCGCGT
TCACGGCG
AGTCGTCA
CCCTAATT
CATTAGCG
TGGTTAAT
TGGGTTAC
GACTACCC
TAGCGCTG
TACTGTTC
AGTGAGCC
CCTTTAAC
GAGCGTCG
GTACAGTT
ATTCTCCA
GACACGCG
TACTAGAA
GTTCATCA
TATAGTCC
CTAATTTT
TACCCATG
CATGTATG
GAACTACG
CGGTTAGT
CACATGCG
TGTGACCT
TTCATCTA
GCCCTGGG
TTGGGAGC
CTACTACC
AGAACCAC
AGTCGACT
TTGATGCT
CCCTCCTA
AGCAAGTT
AGAGCGAA
TACGACAA
CGATTAGC
TACTTAGA
ACCCACCG
CTGAGTTC
TGTCATAA
CTATTGCC
ATCTGTAC
TAACTCAG
CTTGTACC
CATCATTG
GCTTGGAG
TATGAAGC
CAGACTAT
GAATCGTG
AACGATCC
ATTTGTCC
AGGGCCGG
GATATGTC
GATTCTGG
TCCAATCC
CGCGGTAA
TAGGGTCT
TCGGTATT
TCGTTGGT
TGACTGGG
GGTTATTA
ACCATGCG
TTTCAGGC
TACTGAGA
GGTTATAC
CACGCAGT
GCGAACCA
ACCCAGTA
CCGGGCTG
TGGTCTTC
AGATAACA
AAATACTA
TACGGAAG
GGAATAAA
TCTTGATA
TGGAAGGA
AGATGACG
AAACTCAA
TGAAGCGC
CGGTGCGG
TGTCGTCT
AATACGGA
TAAGTGCA
GCTGGTAT
GATCAGAC
CGTTTGGG
AGATAGCA
ACACAGTA
TCCGTGCT
ATTGGCTA
GACTTGAA


Kind Regards
Duncan
 
Doesn't really seem to be an improvement on the Perl version:
Code:
BEGIN { FS="" }
{ for(i=1;i<=NF;i++)
    a[i,$i]++
  columns = NF
}
END {
  j=1
  while ( letter = substr("ACTG",j++,1))
  { print letter ": conclusion"
    for (i=1;i<=columns;i++)
      print i-1, a[i,letter] + 0
  }
}
 
Never mind - thank you anyway futurelet!


Kind Regards
Duncan
 
And my attempt:
NF==1 && length==8{for(i=1;i<9;++i)++a[i,substr($1,i,1)]}
END{
split("A C T G",x)
for(l=1;l<5;++l){
print x[l]": conclusion"
for(i=1;i<9;++i) print i-1, a[i,x[l]]
}
}

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top