Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Permutations

Status
Not open for further replies.

dmcmunn

Programmer
May 20, 2003
85
US
I'm looking for a reference on how to best to design a class for handling variable length pattern matching in permuations of a variable length string, for a genome search project on which I am embarking.

Example:

Source
CCGGGCACTGATGAGACAGCGGCTGTTTGA
Offset
123456789.123456789.123456789.

Search for: GA

Would return
(10) T_GA_T
(13) T_GA_G
(15) A_GA_C
(19) T_GA_

Additional functionality built from this would be:
Pattern search after pattern found
Pattern search between identical patterns
Pattern search between unique pattern pairs
Pattern missing between unique pattern pairs

Can anyone recommend where I might find more information on the extant C++ classes or other OO bodies of work published on genome data search algorithms ?

Best regards,
dmcmunn
 
Most languages, including scripting-languages like bash and ruby know regexp.
At least for unixes and windows you get free available versions of 'sed' and 'awk', which are great in pattern-searching.

sed -n '/.GA.\?/p' data
or
grep ".GA.\?" data

finds TGAC, CGAT, ..., CGA in the file 'data' i.e.

To find the offset, a programming language will be of help, and for storing them in a collection, to do additional searches too.
Java could be used for this task.
java.util.regex and java.lang.String are sources of information in the docs.



seeking a job as java-programmer in Berlin:
 
While I appreciate that you wanted C++ examples, there is quite a lot of activity on nucleotide string processing in the perl forum.

Perl has a number of idiosyncracies (as the saying goes, 'strong typing is for weak minds'), but its regular expression processing, pattern matching, and string processing are the best there is.

You might want to check out where they already have quite a lot of functionality that might be of use to you.
 
Thanks stefenwagner and stevexff!

Being weak-minded, it so happens that I made a good living in the early days of the web writing cgi scripts in perl, so I guess I'll check that out bio.perl.org first.

Thanks again for the help.
dmcmunn
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top