Loading…

Loading grant details…

Completed NON-SBIR/STTR RPGS NIH (US)

Scalable Computational Methods for Genealogical Inference: from species level to single cells

$3.15M USD

Funder NATIONAL HUMAN GENOME RESEARCH INSTITUTE
Recipient Organization University of California Berkeley
Country United States
Start Date Sep 01, 2023
End Date Sep 14, 2024
Duration 379 days
Number of Grantees 3
Roles Co-Investigator; Principal Investigator
Data Source NIH (US)
Grant ID 10889303
Grant Description

PROJECT SUMMARY Massive amounts of genomic data are currently being generated, providing unprecedented opportunities for biomedical researchers to characterize various biological components and processes. In order to utilize these data to make new biological discoveries and improve human health, accurate models and scalable computational

tools need to be developed to facilitate analysis and interpretation. The central objective of this project is to address this challenge by developing more realistic probabilistic models, scalable algorithms, and user-friendly software tools to enable the biomedical research community to better harness large genomic data. Many prob-

lems in genomics rely on computational methods for inferring genealogical information from large sequence data and interpreting the reconstructed trees. In this application, we propose to make significant strides towards im- proving this line of research by developing a suite of robust and scalable algorithms for probabilistic models of

molecular evolution and genealogical inference across multiple timescales. We will achieve our goal by carrying out the following specific aims: 1) A fundamental problem in statistical analysis of molecular evolution is esti- mating model parameters, for which maximum likelihood estimation (MLE) is typically employed. Unfortunately,

MLE is a computationally expensive task, in some cases prohibitively so. In Aim 1, we will utilize a novel MLE framework and modern optimization methods to develop a broadly applicable computational method that achieves several orders of magnitude speedup in MLE while maintaining high statistical efficiency for

general models of molecular evolution. We will apply our tools to improve phylogenetic inference for two clin- ically important superfamilies of membrane proteins in humans, namely G protein-coupled receptors (GPCRs) and Solute carrier (SLC) transporters. 2) Because of meiotic recombination, the genetic variability within humans

cannot be represented by a single tree. Instead, there are millions of different trees across the genome, where each position in the genome will tend to have its own tree that only differs minimally from the trees in nearby sites. The collection of all these trees, and the set of recombination points creating new trees, is represented

by the Ancestral Recombination Graph (ARG), which has a number of applications in human genetics. Despite substantial recent progress on reconstructing ARGs, however, current methods are either too slow to scale up to large data sets, or they do not sample ARGs accurately from a well-calibrated posterior distribution. In Aim 2,

will develop a new scalable computational method to improve ARG reconstruction and sampling. We will test the method extensively on simulated data, develop a number of applications, and apply it on a number of different human data sets to illustrate its utility. 3) Applications of genealogical inference methods have been

rapidly growing in single-cell genomics. In particular, advances in CRISPR/Cas9 genome editing technologies have enabled lineage tracing for thousands of cells in vivo, and the problem of reconstructing trees from such data has received considerable attention recently. In Aim 3, we will develop scalable algorithms to reconstruct

time-resolved single-cell trees for thousands of cells sampled at multiple time points. We will also develop a novel statistical method grounded in rigorous theory to improve fitness estimation from trees. We will apply the methods developed here to analyze single-cell lineage-tracing data from an iterative metastasis experiment to

study cancer evolution, as well as B cell affinity maturation data from a highly innovative experimental design to study germinal center evolution.

All Grantees

University of California Berkeley

Advertisement
Apply for grants with GrantFunds
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant