Active STANDARD GRANT National Science Foundation (US)

DMS/NIGMS 1: Statistical Methods for Design and Analysis of Clinical-scale Single Cell Studies

$6M USD

Funder	National Science Foundation (US)
Recipient Organization	University of Pennsylvania
Country	United States
Start Date	Jul 01, 2023
End Date	Jun 30, 2026
Duration	1,095 days
Number of Grantees	2
Roles	Principal Investigator; Co-Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2245575`

Grant Description

Over the last two decades, scientific discoveries in biology and medicine have increasingly relied on the statistical analysis of large genomic data sets. Thus, the development of rigorous statistical methods for reproducible and scalable analyses on such data sets, and the interdisciplinary training of a next generation of scientists who can straddle the computational and biomedical sciences, is crucial for our scientific advancement.

This project focuses on challenges in the analysis of data from single cell experiments, which are made possible by recent technological developments enabling genome-scale measurements to be made on individual cells, in high throughput across millions of cells at one go. Such single cell experiments are now the bread-and-butter of biomedical research, playing a cardinal role in our quest for a more complete understanding of cell biology as well as our pursuit of cures for every disease, from infectious diseases such as COVID-19 to cancer to aging-related maladies such as neurodegeneration.

Despite the promise of these new technologies, current methods for single cell data analysis are not designed for clinical-scale disease research, and do not adequately harness the information provided by the many recently completed single cell reference atlases, the latter realized through years of consortia-level efforts and millions of dollars in national funding. This project aims to bridge this computational analysis gap, specifically by addressing two critical limitations in the field: (1) The lack of a principled approach for the removal of unwanted technical variation from clinical-scale single cell sequencing studies, and (2) the need for an integrated approach to clinical-scale proteomic profiling that harness single cell reference atlases to achieve more precise cell type composition analysis.

For both problems, the project’s goal is to develop principled, transparent, and scalable methods for reproducible scientific research. Broader impacts of this project are its potential impact in improving the well-being of individuals in society; its contributions in STEM education for undergraduate and graduate students; and in the involvement and participation in research of women and under-represented minorities in STEM fields.

In more technical detail, the first aim of the project focuses on the removal of technical “batch effects”. Technical batch correction is an unavoidable and challenging step in *all* single cell data analysis pipelines. Many methods have been proposed for batch effect correction in single cell experiments, but they were designed to align cells across all batches, ignoring design principles such as control cohorts, longitudinal sampling, and biological replicates.

Without utilizing these design principles, existing methods can not adequately estimate batch effects and often confound them with real biological signals. Aim 1 of the project formulates new statistical methods for batch effect correction that are adaptable to a multitude of experimental designs and develops statistical inference procedures for quantifying the strength of biological signals (e.g. differential expression or emergence of a new cell type) accounting for the uncertainty in batch correction.

Aim 2 of the project tackles a different challenge arising in clinical-scale single cell proteomic profiling: Despite decreasing sequencing costs, flow and mass cytometry are still orders-of-magnitude faster and cheaper, and thus remains the method of choice in clinical-scale immunological studies where large cohorts need to be profiled on a tight timeline. However, each flow/mass cytometry run only measures a limited panel of proteins, and thus does not allow cell-type labelling at the level of detail afforded by single cell transcriptomics.

Aim 2 develops a new approach to cell type profiling that integrates multiple flow/mass cytometry runs, with complementary panels, on the same sample, with the goal of achieving cell-type tabulation accuracy matching state-of-the-art single cell sequencing protocols at only a fraction of time and cost. This aim leverages the growing compendium of single cell reference atlases, providing a roadmap for the future use of these atlases in population-level cell type censusing projects.

Through collaborations, the developed methods will be applied to multiple ongoing large-cohort studies that have direct clinical impact. Methods developed in this project will be released as open-source software, and the datasets generated will be uploaded to public repository for general use.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

University of Pennsylvania

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

DMS/NIGMS 1: Statistical Methods for Design and Analysis of Clinical-scale Single Cell Studies

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants