This is a tangent to this thread:
As I was testing code, I noticed that the pair "AC" seemd to be starting most of the 8 base-pair sequences. I would have thought that since it's random any base-pair could start a sequence. Maybe there is a flaw in my code or the hash to count the frequency that base-paris occur in:
quite literally "AC" is always the most frequent pair in the first position, as in this example print out:
there appear to be other patterns in the frequency of occurences of other pairs too, any thoughts concerning this? I feel like I must be missing something obvious.
As I was testing code, I noticed that the pair "AC" seemd to be starting most of the 8 base-pair sequences. I would have thought that since it's random any base-pair could start a sequence. Maybe there is a flaw in my code or the hash to count the frequency that base-paris occur in:
Code:
#!perl
use strict;
#use Benchmark qw(:all);
#use Data::Dump qw(dump);
my @fullset = qw(AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT);
my $probab = 0.851;
my %results = ();
my %pairs = map ($_ => 0) for (0..3);
for (0..99) {
my %base_pairs = ();
my @motif = ();
until (scalar keys %base_pairs == 8) {
my $index = int(rand(@fullset));
$base_pairs{$fullset[$index]} = $fullset[$index];
}
my @base_pairs = keys %base_pairs;
my @major = @base_pairs[0..3];
my @minor = @base_pairs[4..7];
for my $i (0..3) {
push @motif, rand(1) < $probab ? $major[$i] : $minor[$i];
$pairs{$i}{$motif[$i]}++;
}
my $motif = join('',@motif);
redo if exists $results{$motif};
$results{$motif} = $motif;
# print "major = @major , minor = @minor , results = $motif\n";
}
#print "$_\n" for keys %results;
print "\n\n";
for my $keys (sort{$a <=> $b} keys %pairs) {
print "Frequency of pairs at position @{[$keys+1]}:\n";
for my $freq (sort { $pairs{$keys}{$b} <=> $pairs{$keys}{$a} } keys %{$pairs{$keys}}) {
print " $freq = $pairs{$keys}{$freq}\n";
}
print "\n";
}
quite literally "AC" is always the most frequent pair in the first position, as in this example print out:
Code:
Frequency of pairs at position 1:
AC = 47
CC = 18
AG = 12
GG = 8
GA = 6
TG = 4
AA = 3
CT = 3
AT = 1
TT = 1
CG = 1
TA = 1
there appear to be other patterns in the frequency of occurences of other pairs too, any thoughts concerning this? I feel like I must be missing something obvious.