“New methods for measuring natural selection and predicting deleterious variants in noncoding regions of the human genome.”
Many genetic variants that influence phenotypes of interest are located outside of protein-coding genes, yet existing methods for identifying such variants have poor predictive power. I will describe a new computational method, called LINSIGHT, that substantially improves the prediction of noncoding nucleotide sites at which mutations are likely to have deleterious fitness consequences, and which, therefore, are likely to be phenotypically important. LINSIGHT combines a generalized linear model for functional genomic data with a probabilistic model of molecular evolution. The method is fast and highly scalable, enabling it to exploit the ‘big data’ available in modern genomics. I will show that LINSIGHT outperforms the best available methods in identifying human noncoding variants associated with inherited diseases. In addition, I will describe an application of LINSIGHT to an atlas of human enhancers and show that the fitness consequences at enhancers depend on cell type, tissue specificity, and constraints at associated promoters. Finally, I will describe an extension of LINSIGHT that considers the full site frequency spectrum and allows for the estimation of position- and allele-specific selection coefficients. So far, we have applied this method to coding sequences in the human genome, where it reveals surprisingly strong selection on synonymous sites, classes of genes that have undergone relaxed and enhanced selection in recent human evolution, and other aspects of natural selection on coding regions. Work is underway to extend this method to noncoding regions.