Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Read file in array 1

Status
Not open for further replies.

mama12

Programmer
Oct 18, 2005
22
NL
Hi all,
what is the beste script in Perl to read a file (DNA Sequence)in array and mutate this file between 7% and 10 %
 
Hi again

how to read something ( file)in array in %

I appreciate any help

regards
 
If you show us some sample data and sample output that yuo would like I'm sure we can help.


Trojan.
 
Hi again

DNA File like AAAACCCCTTTTGGGG

AAAACCCCTTTTGGGG mutate 7% and 10 % in this file

If it C it has to be A or T or G
If it is T it has to be A or C or G

uitput can be like that

AAAACCtCTaTTGGGG

I appreciate any help

regards
 
I'm sorry, I don't really understand genetic research so if you want some help I would appreciate a more detailed explanation.
Do you just want 7% of the characters to be swapped for another random character of the set "ACTG"?
I will assume that is the case and look at creating something that does this for you.
If I have misunderstood then I apologise and ask that you offer a more detailed explanation.
I will post when I have a little code for you.


Trojan.
 
thanx for ur reply,

first read the file in array and from this array mutate 7% and 10 %.
yes ,I want 7% and 10% of the characters to be swapped for another random character of the set "ACTG" but not the same as 7% and 10% is has te be another character.
like this

If it C it has to be A or T or G
If it is T it has to be A or C or G

I appreciate ur work

regards
 
Can I suggest that you read the first two FAQs on this forum, faq219-2884 and faq219-2889?

Understanding them will enable you to get more useful answers faster.

Yours,

fish

["]As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs.["]
--Maur
 
Try this and see if it does what you want:
Code:
#!/usr/bin/perl -w
use strict;
my $ratio           = 7;  # Percentage error to apply
my $swapped_total   = 0;
my $characters_seen = 0;
my $charset         = "ACGT";
while(<DATA>) {
  chomp;
  # The line below calculates the number of characters to swap in this line
  my $this_line_todo = int(($characters_seen + length($_)) * $ratio / 100)
                     - $swapped_total;
  # Now we need to find the positions of the characters to change
  my %indexes = ();  # Place to store indexes
  $indexes{int(rand(length))}++
    while(scalar(keys %indexes) < $this_line_todo *2);
  foreach my $index (keys %indexes) {
    my $char = substr($_,$index,1);
    my $newchar;
    do {
      $newchar = substr($charset, rand(4), 1);
    } while ($char eq $newchar);
    substr($_,$index,1) = lc $newchar;
  }
  $characters_seen += length;
  $swapped_total   += $this_line_todo;
  print $_,"\n";
}
print "Total characters seen is $characters_seen\n";
print "Total characters swapped is $swapped_total\n";
__DATA__
GAATATCCCCATGATCTTTCCCTCAATCGCCCGCTGATAAGTGGGAAGACATCG
GTCGCGCCACACTCGATACCCTGCTCATGGTGGCGCTTGGTCTTCCCTTGGGAAT
GAATATCCCCATGATCTTCCCCTCAATCGACCTGGACGCTGATAAGTGGAAAGACATCG
GTCGCGCCACACTCGATACCCTGCTCACCCGCATTGGCGCTTGGTCTTCCCTTGGGAAT
GAATATCCCCATGATCTTCCCCTCAAACCTGGACGTTGATAAGTGGAAAGACATCG
GAATATCCCCATGATCTTTCCCTCAATCGACGCTGATAAGTGGGAAGACAT
GTCGCGCCACACTCGATACCCTGCTCGGCGCTTGGTCTTCCCTTGGGAATC
GGTTGGCGGTGCCGCCCTCGTGCAACCAATCAAGTTTGGTGGCGATGTTG
CACCAACGCTTAGTGTCACCTACTACATCACTAAAAAGTTGAGTTAT
GGTTGGCGGCGCCGCCCTCGTGCAACCAACAAGTTTGGTGGCGATGTTG
CACCAACGCTAAGTGTAACCTACTACATCAAGGGGCATTACTAAAAAGTT
AAAATGCAGCACAGAATACTGTCAAGTTTGGTGGCGATGTTG
GGTTGGCGGCGCTGCCCTCGTGCAACCAAAGAAAAGTTTGGTGGCGATGTTG
CACCAACGCTTAGTGTCACCTACTACATCAACGGGTATTACTAAAAAGTTGAG
AATGACCGAAATCAAGGAAGCTTTTGTCCCCCCCAGTGATTGAAGTGCTAGTCG
TTGGCGATACCGTCTCCAAGGGCCAAAGTTTCAACCATGGAAGTACCTTCGTCA
AATGACCGAAATCAAGGAAGCTTTTGTCCCCCCCAGTGATTGAAGTGCTAG
TTGGCGATACCGTCTCCAAGGGCCAAACAACCATGGAAGTACCCTCGTCA
AATGACCGAAATCAAGGAAGCTTTTGTCGTCCCAGTGATTGAAGTGCTAGTC
TTGGCGATACCGTCTCCAAGGGCCAAAGCAACCATGGAAGTACCCTCGTCA


Trojan.
 
Just altered it slightly to take into account a little rounding error. As you can see, the ratio is set at the top to 7% so if you want to run at 10% then you just need to change that value.
Code:
#!/usr/bin/perl -w
use strict;
my $ratio           = 7;  # Percentage error to apply
my $swapped_total   = 0;
my $characters_seen = 0;
my $charset         = "ACGT";
while(<DATA>) {
  chomp;
  # The line below calculates the number of characters to swap in this line
  my $this_line_todo = int(($characters_seen + length($_)) * $ratio / 100 +0.5)
                     - $swapped_total;
  # Now we need to find the positions of the characters to change
  my %indexes = ();  # Place to store indexes
  $indexes{int(rand(length))}++
    while(scalar(keys %indexes) < $this_line_todo *2);
  foreach my $index (keys %indexes) {
    my $char = substr($_,$index,1);
    my $newchar;
    do {
      $newchar = substr($charset, rand(4), 1);
    } while ($char eq $newchar);
    substr($_,$index,1) = lc $newchar;
  }
  $characters_seen += length;
  $swapped_total   += $this_line_todo;
  print $_,"\n";
}
print "Total characters seen is $characters_seen\n";
print "Total characters swapped is $swapped_total\n";
__DATA__
GAATATCCCCATGATCTTTCCCTCAATCGCCCGCTGATAAGTGGGAAGACATCG
GTCGCGCCACACTCGATACCCTGCTCATGGTGGCGCTTGGTCTTCCCTTGGGAAT
GAATATCCCCATGATCTTCCCCTCAATCGACCTGGACGCTGATAAGTGGAAAGACATCG
GTCGCGCCACACTCGATACCCTGCTCACCCGCATTGGCGCTTGGTCTTCCCTTGGGAAT
GAATATCCCCATGATCTTCCCCTCAAACCTGGACGTTGATAAGTGGAAAGACATCG
GAATATCCCCATGATCTTTCCCTCAATCGACGCTGATAAGTGGGAAGACAT
GTCGCGCCACACTCGATACCCTGCTCGGCGCTTGGTCTTCCCTTGGGAATC
GGTTGGCGGTGCCGCCCTCGTGCAACCAATCAAGTTTGGTGGCGATGTTG
CACCAACGCTTAGTGTCACCTACTACATCACTAAAAAGTTGAGTTAT
GGTTGGCGGCGCCGCCCTCGTGCAACCAACAAGTTTGGTGGCGATGTTG
CACCAACGCTAAGTGTAACCTACTACATCAAGGGGCATTACTAAAAAGTT
AAAATGCAGCACAGAATACTGTCAAGTTTGGTGGCGATGTTG
GGTTGGCGGCGCTGCCCTCGTGCAACCAAAGAAAAGTTTGGTGGCGATGTTG
CACCAACGCTTAGTGTCACCTACTACATCAACGGGTATTACTAAAAAGTTGAG
AATGACCGAAATCAAGGAAGCTTTTGTCCCCCCCAGTGATTGAAGTGCTAGTCG
TTGGCGATACCGTCTCCAAGGGCCAAAGTTTCAACCATGGAAGTACCTTCGTCA
AATGACCGAAATCAAGGAAGCTTTTGTCCCCCCCAGTGATTGAAGTGCTAG
TTGGCGATACCGTCTCCAAGGGCCAAACAACCATGGAAGTACCCTCGTCA
AATGACCGAAATCAAGGAAGCTTTTGTCGTCCCAGTGATTGAAGTGCTAGTC
TTGGCGATACCGTCTCCAAGGGCCAAAGCAACCATGGAAGTACCCTCGTCA


Trojan.
 
Hi again

thanks for ur help

I appreciate ur work Trojan

regards
 
HI Torjan.

the script is working thanx for it,but I am searchting voor script that read a file like this

#!/usr/bin/perl -w
use strict;
use warnings;

# Get the DNA sequence data
print "Please type the filename of the DNA sequence data: ";

$dna_filename = <STDIN>;

chomp $dna_filename;

and it will mutate the (AAAAAAAAAATTTTTTGGGGCCCCCC) in the file(filename) 7% and 10% only and save it in another file (filename)

that it what I am really searching for

I appreciate any help

regards

mama12

 
HI Torjan.

the script is working thanx for it,but I am searchting voor script that read a file like this

#!/usr/bin/perl -w
use strict;
use warnings;

# Get the DNA sequence data
print "Please type the filename of the DNA sequence data: ";

$dna_filename = <STDIN>;

chomp $dna_filename;

then read the file in array

and it will mutate the (AAAAAAAAAATTTTTTGGGGCCCCCC) in the file(filename) 7% and 10% only and save it in another file (filename)

that it what I am really searching for

I appreciate any help

regards

mama12
 
OK, try this then:
Code:
#!/usr/bin/perl -w
use strict;
my $charset = "ACGT";

# Get the DNA sequence data filename
print "Please type the filename of the DNA sequence data: ";
my $dna_filename = <>;
chomp ($dna_filename);

# Validate filename
$dna_filename =~ /^\w+(?:\.\w+)$/
  or die "Invalid filename format [$dna_filename]";

# Process once at 7% and once at 10%
foreach my $ratio (7,10) {
  my $swapped_total   = 0;
  my $characters_seen = 0;

  # Open file for reading
  open FH, "$dna_filename"
    or die "Failed to open file [$dna_filename]";

  open OUT, ">${dna_filename}_${ratio}percent"
    or die "Failed to create output file [${dna_filename}_${ratio}percent]";
  while(<FH>) {
    chomp;
    # The line below calculates the number of characters to swap in this line
    my $this_line_todo = int(($characters_seen + length($_)) * $ratio / 100 +0.5)
                       - $swapped_total;
    # Now we need to find the positions of the characters to change
    my %indexes = ();  # Place to store indexes
    $indexes{int(rand(length))}++
      while(scalar(keys %indexes) < $this_line_todo *2);
    foreach my $index (keys %indexes) {
      my $char = substr($_,$index,1);
      my $newchar;
      do {
        $newchar = substr($charset, rand(4), 1);
      } while ($char eq $newchar);
      substr($_,$index,1) = lc $newchar;
    }
    $characters_seen += length;
    $swapped_total   += $this_line_todo;
    print OUT $_,"\n";
  }
  print "Total characters seen is $characters_seen\n";
  print "Total characters swapped is $swapped_total\n";
  close OUT;
  close FH;
}


Trojan.
 
HI again,

the script is working thanx for it, I appreciate ur work

how can I make a little change in this script the out put of this file will be saving ( the mutate 7% and 10%) in 1 file not in 2 files ,the name of the file is giving bij user.

regards

mama12

 
mama12,
Can you understand anything that I have developed for you so far?
I've done 99% for you. I would be surprised if you cannot make the last little changes that you ask for.
The best way to learn is to try doing a little for yourself sometimes.
Give it a go and come back if you still have trouble.
Good Luck! :)


Trojan.
 
thanx any way for u work

Kind Regards

mama12
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top