I'm looking for a reference on how to best to design a class for handling variable length pattern matching in permuations of a variable length string, for a genome search project on which I am embarking.
Example:
Source
CCGGGCACTGATGAGACAGCGGCTGTTTGA
Offset
123456789.123456789.123456789.
Search for: GA
Would return
(10) T_GA_T
(13) T_GA_G
(15) A_GA_C
(19) T_GA_
Additional functionality built from this would be:
Pattern search after pattern found
Pattern search between identical patterns
Pattern search between unique pattern pairs
Pattern missing between unique pattern pairs
Can anyone recommend where I might find more information on the extant C++ classes or other OO bodies of work published on genome data search algorithms ?
Best regards,
dmcmunn
Example:
Source
CCGGGCACTGATGAGACAGCGGCTGTTTGA
Offset
123456789.123456789.123456789.
Search for: GA
Would return
(10) T_GA_T
(13) T_GA_G
(15) A_GA_C
(19) T_GA_
Additional functionality built from this would be:
Pattern search after pattern found
Pattern search between identical patterns
Pattern search between unique pattern pairs
Pattern missing between unique pattern pairs
Can anyone recommend where I might find more information on the extant C++ classes or other OO bodies of work published on genome data search algorithms ?
Best regards,
dmcmunn