“This is one of the things that’s really fascinating about mathematical research, is sometimes you can see connections between topics, which on the surface they seem so different, but at a mathematical level, they might be using some of the same technical ideas.”
All of these questions involve mapping the likelihood of different variations on a biological theme: which combinations of mutations are most likely to arise in a particular protein, for example, or which chromosome mutations are most often found together in the same cancer cell. McCandlish explains that these are problems of density estimation — a statistical tool that predicts how often an event happens. Density estimation can be relatively straightforward, such as charting different heights within a group of people. But when dealing with complex biological sequences, such as the hundreds, or thousands of amino acids that are strung together to build a protein, predicting the probability of each potential sequence becomes astonishingly complex.
McCandlish explains the fundamental problem his team is using math to address: “Sometimes if you make, say one mutation to a protein sequence, it doesn’t do anything. The protein works fine. And if you make a second mutation, it still works fine, but then if you put the two of them together, now you’ve got a broken protein. We’ve been trying to come up with methods to model not just interactions between pairs of mutations, but between three or four or any number of mutations.”
The methods they have developed can be used to interpret data from experiments that measure how hundreds of thousands of different combinations of mutations impact the function of a protein.
This study, reported in the Proceedings of the National Academy of Sciences, began with conversations with two other CSHL colleagues: CSHL Fellow Jason Sheltzer and Associate Professor Justin Kinney. They worked with McCandlish to apply his methods to gene expression and the evolution of cancer mutations. Software released by McCandlish’s team will enable other researchers to use these same approaches in their own work. He says he hopes it will be applied to a variety of biological problems.