Captainrave
Technical User
Hi everyone. I need help again! Basically I have DNA repeats in a csv file with some other information in adjacent columns.
The repeats can be anything like (always in the first column):
AT|AT|AT|AT (4)
ATTA|ATTA|ATTA|ATTA (4)
A|A|A|A|A|A (6) etc....
What I need to do is to output a new column that contains the "number of repeat units". I have broken up the repeats (above) to try and show you what I mean. So something like ATATATAT when broken down would be assigned the number 4...since it is AT repeated 4 times.
I had an idea for this. I was hoping to use the regex I originally used to locate repeats and use $& to output the number of repeat units in the computers memory. I have no idea how to implement this, and wondered if anyone had a better idea?
The repeats can be anything like (always in the first column):
AT|AT|AT|AT (4)
ATTA|ATTA|ATTA|ATTA (4)
A|A|A|A|A|A (6) etc....
What I need to do is to output a new column that contains the "number of repeat units". I have broken up the repeats (above) to try and show you what I mean. So something like ATATATAT when broken down would be assigned the number 4...since it is AT repeated 4 times.
I had an idea for this. I was hoping to use the regex I originally used to locate repeats and use $& to output the number of repeat units in the computers memory. I have no idea how to implement this, and wondered if anyone had a better idea?
Code:
#!C:/Perl/bin/perl.exe -w
#Opening repeat distribution file
print "please type the filename of the repeatdistribution.csv file:";
$repeat_filename = <STDIN>;
chomp $repeat_filename;
print "please type the filename to save the results to (.csv format !!important!!):";
$outfile = <STDIN>;
chomp $outfile;
open(REPEATFILE,$repeat_filename);
open(OUTFILE,">$outfile");
###########################
# Output Unit Size #
###########################
#Splits the line into columns
while (my$line = <REPEATFILE>){
my($firstcol)= split /,/, $line;
#Filters the repeats
if ($firstcol = m/([acgt]+)(\1){3,39}(?!\1)?/xig){
#
print $& into new empty column;
}
}
exit;