Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | Harvard University |
| Country | United States |
| Start Date | Jul 01, 2024 |
| End Date | Jun 30, 2029 |
| Duration | 1,825 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2338760 |
Causal inference refers to a systematic way of deciphering causal relationships between entities from empirical observations – an epistemic framework that underlies past, present, and future scientific and social development. For designing statistical methods for causal inference, the gold standard pertains to randomized clinical trials where the researcher assigns treatment/exposure to subjects under study based on pure chance mechanisms.
The random assignment negates systematic bias between the observed relationship between the treatment/exposure and outcome due to unknown common factors referred to as confounders. However, randomized clinical trials are often infeasible, expensive, and ethically challenging. In contrast, modern technological advancement has paved the way for the collection of massive amounts of data across a spectrum of possibilities such as health outcomes, environmental pollution, medical claims, educational policy interventions, and genetic mutations among many others.
Since accounting for confounders in such data is the fundamental aspect of conducting valid causal inference, one of the major foci of modern causal inference research have been to design procedures to account for complex confounding structures without pre-specifying unrealistic statistical models. Despite the existence of a large canvas of methods in this discourse, the complete picture of the best statistical methods for inferring the causal effect of an exposure on an outcome while adjusting for arbitrary confounders remains largely open.
Moreover, there are several popularly used methods that require rigorous theoretical justification and subsequent modification for reproducible statistical research in the domain of causal inference. This project is motivated by addressing these gaps and will be divided into two broad interconnected themes. In the first part, this project provides the first rigorous theoretical lens to the most popular method of confounder adjustment in large-scale genetic studies to find causal variants of diseases.
This will in turn bring forth deeper questions about optimal statistical causal inference procedures that will be explored in the second part of the project. Since the project is designed to connect ideas from across statistical methods, probability theory, computer science, and machine learning, it will provide unique learning opportunities to design new courses and discourses.
The project will therefore integrate research with education through course development, research mentoring for undergraduate and graduate students, especially those from underrepresented groups, and summer programs.
This project will focus on two broad and interrelated themes tied together by the motivation of conducting statistical and causal inference with modern observational data. The first part of the project involves providing the first detailed theoretical picture of the most popular principal component-based method of population stratification adjustment in genome-wide association studies.
This part of the project also aims to provide new methodologies to correct for existing and previously unknown possible biases in the existing methodology as well as guidelines for practitioners for choosing between methods and design of studies. By recognizing the fundamental tenet of large-scale genetic data analysis as the identification of causal genetic determinants of disease phenotypes, the second part of the project develops the first complete picture of optimal statistical inference of causal effects in both high-dimensional under sparsity and nonparametric models under smoothness conditions.
Moreover, this part of the project responds to the fundamental question of tuning learning algorithms for estimating nuisance functions, such as outcome regression and propensity score for causal effect estimation, to optimize the downstream mean-squared error of causal effect estimates instead of prediction errors associated with these regression functions. The overall research will connect ideas from high-dimensional statistical inference, random matrix theory, higher-order semiparametric methods, and information theory.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Harvard University
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant