Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Help on Random motif

Status
Not open for further replies.

Everwood

Technical User
Jul 18, 2005
78
US
Hi all,

I want to generate a random motif of width 8. Each position will have a dominant letter with probability around 0.85. Repeat this to generate 100 motifs.

For example, I have a motif like:

ATCGTGCC

The first nucleotide is A and its probability is 0.85, which means the probability of other letters T or C or G which can be selected to appear on the first position is 0.05.

My code for this is given below. I borrowed it from and made a slight change. However, it resulted in some errors:

********************************
"use" not allowed in expression at generate_random_motif.pl line 52, at end of line
syntax error at generate_random_motif.pl line 52, near "to be
# written to output.
#
# B. Steipe, July 2005. Public domain code.
# =====================================================================

use strict"
BEGIN not safe after errors--compilation aborted at generate_random_motif.pl line 53.
********************************

I need some suggestions on this code or do you have
some other ideas?

Thanks a lot!

Alex





The code

*************************************************
#!/usr/bin/perl
# randommotifs.pl
#
# This program generates randomized instances of pseudomotifs, i.e.
# sequences that are obtained by adding "noise", according to a
specified
# model to a consensus sequence. To keep this reasonably general, we
# proceed through the following steps:
#
# Define consensus motif and alphabet
# For the required number of repetitions
# For each position in motif
# Define an appropriate probability distribution over the
alphabet
# Produce a character according to the distribution
# ...
# ...
#
# The probability distributions are generated to produce a consensus
character
# with a particular "consensus probability" and all other characters
# with equal probabilites (the noise). This is defined in a single
subroutine
# so it is straightforward to change this for a different model of noise
# (e.g. increase the probabilities for "similar" characters). It's easy
# to become creative here and incorporate (biological) knowledge into
the
# noise-model.
#
# Probablities are then converted into cumulative intervals to chose a
# particular character:
#
# Eg: Probabilities A: 0.4, C: 0.3, G: 0.2, T: 0.1
# Intervals A: 0.4 C: 0.7, G: 0.9, T: 1.0
#
# This example can be pictured as
#
# 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
# +----+----+----+----+----+----
+----+----+----+----+
# | A | C | G | T |
# +----+----+----+----+----+----+----+----+----+----+
#
# .. now all we need to do is generate a random number between 0 and 1
# and see in which interval it falls. This then points to the character
to be
# written to output.
#
# B. Steipe, July 2005. Public domain code.
# =====================================================================

use strict;
use warnings;


my $N_Sequences = 100; # Number of sequences to be
generated
my @Motif = split(//,'TTTATAAT'); # This example is simply a
concatenation
# of the -35 and -10 consensus
elements
# of prokaryotic promoters;
replace with
# whatever ...
my @Alphabet = split(//,'ACGT'); # Nucleotides; replace with
whatever ...
my $P_Consensus = 0.85; # Replace with whatever ...

# ====== Globals ==========================
my @Probabilities; # Stores the probability of each
character
# ====== Program ==========================


for (my $i=0; $i < $N_Sequences; $i++) {
for (my $j=0; $j < scalar(@Motif); $j++) {

loadConsensusCharacter($Motif[$j]); # Load the character for
this position
addNoiseToDistribution(); # Modify probabilities
according to noise model
convertToIntervals();
print(getRandomCharacter(rand(1.0))); # Output character
}
print "\n";
}

exit();

# ====== Subroutines =======================
#
sub loadConsensusCharacter {
my ($char) = @_;
my $Found = 'FALSE';

for (my $i=0; $i < scalar(@Alphabet); $i++) {
if ( $char eq $Alphabet[$i]) { # found character in i_th
position in Alphabet
$Probabilities[$i] = 1.0;
$Found = 'TRUE';
} else {
$Probabilities[$i] = 0.0;
}
}
if ($Found eq 'FALSE') {
die("Panic: Motif-Character\"$char\" was not found in Alphabet.
Aborting.\n");
}

return();
}

# ==========================================
sub addNoiseToDistribution {

# This implements a very simple noise model:
# We work on an array in which one element is 1.0 and
# all others are 0.0. The element with 1.0 has the same
# index as the consensus character in the Alphabet.
#
# We change that value to the consensus probability and
# we distribute the remaining probability equally among
# all other elements.
#
# It should be straightforward to implement more elaborate
# noise models, or use a position specific scoring matrix
# or something else here.

my $P_NonConsensus = ( 1.0-$P_Consensus) / (scalar(@Alphabet) - 1);

for (my $i=0; $i < scalar(@Probabilities); $i++) {
if ( $Probabilities[$i] == 1.0 ) { # ... this is the consensus
character
$Probabilities[$i] = $P_Consensus;
} else {
$Probabilities[$i] = $P_NonConsensus;
}
}

return();
}

# ==========================================
sub convertToIntervals {

my $Sum = 0;

# Convert the values of the probabilities array to the top
boundaries
# of intervals having widths proportional to their relative
frequency.
# Numbers range from 0 to 1.0 ...

for (my $i=1; $i < scalar(@Probabilities); $i++) {
$Probabilities[$i] += $Probabilities[$i-1];
}

return();
}

# ==========================================
sub getRandomCharacter {

my ($RandomNumber) = @_;
my $i=0;

# Test which interval contains the RandomNumber ...
# The index of the interval points to the Event we should choose.

for ($i=0; $i < scalar(@Probabilities); $i++) {
if ($Probabilities[$i] > $RandomNumber) { last; }
}

return($Alphabet[$i]);
}
*************************************************
 
I think that your code as wrapped unfortunately and one of the ail-ends of one of the comments s being interpreted as code. The chunk of code quoted as the sytax error is nearly all comment and yet it is being parsed. You need to make sure that every line before "use strict" begins with a # character.

Note that if it has suffered from wrapping, there will almost certainly be other instances so you could save some debugging time by examining all lines following long lines to see if they are inadvertant orphans.

Yours,

fish

&quot;As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs.&quot;
--Maurice Wilkes
 
I add some "#"s to the beginnings of some explanation sentences. But there are still some error messages like:

syntax error at generate_random_motif8.pl line 56, near "generated
my "
Global symbol "@Motif" requires explicit package name at generate_random_motif8.pl line 56.
Global symbol "@Motif" requires explicit package name at generate_random_motif8.pl line 73.
Global symbol "@Motif" requires explicit package name at generate_random_motif8.pl line 75.
Bareword "whatever" not allowed while "strict subs" in use at generate_random_motif8.pl line 64.
Execution of generate_random_motif8.pl aborted due to compilation errors.

Thanks,
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top