Abstractly, an RNA sequence can be treated as a string over a nucleotide alphabet (A, C, G, and U, which stand for Adenine, Cytosine, Guanine, and Uracil, respectively). The string is said to fold to a set of nested base pairs, known as a secondary structure, via the association or pairing of complementary nucleotides (for example, a G is complimentary to a C not to an A).
A natural question then is "how close are these predictions to the to the structures that appear in nature?" The answer is not clear, and brings us to the proposed project areas for this summer.
The optimization problem mentioned in the overview is typically based on the Nearest Neighbor Thermodynamic Model (NNTM), and seeks to minimize the free energy of the structure. The energy function used to score the secondary structures involves thousands of parameters. Yet, the optimal secondary structures obtained under this scoring scheme are often very different from the native structures, when the latter is known. This project will focus on using geometric combinatorics for analyzing the space of parameters used in the energy function and the set of possible optimal structures.
As mentioned above, a structure that has minimum free energy as defined by the NNTM may not be the native structure. Therefore it is interesting to return a set of suboptimal structures, instead of returning a single structure with minimum free energy. That is produce a set of structures that have free energy within some range of the minimum. Such sets become very large for long RNA sequences, and methods are needed to analyze these sets of structures. Students in this project will learn about various metrics on RNA secondary structures, and use those metrics along with clustering algorithms to summarize or classify the types of structures that are near optimal.
The projects are interdisciplinary involving mathematics, computer science and biology. Students should have taken at least two proof-based mathematics courses (eg. abstract algebra, number theory, analysis, topology). As the projects involve some computational components, we especially encourage applications from students that have had programming experience. Students with less experience with advanced mathematics but significant programming experience (the equivalent of at least two project-based computer science courses) will be considered. All relevant biology will be presented during the early weeks of the project.
Please visit the GA Tech School of Mathematics Summer REU page for application details.
For more information about the Undergraduate Summer Research on RNA Structure Prediction, please e-mail RNAREU@math.gatech.edu