Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | Carnegie-Mellon University |
| Country | United States |
| Start Date | Jul 01, 2021 |
| End Date | Jun 30, 2025 |
| Duration | 1,460 days |
| Number of Grantees | 3 |
| Roles | Principal Investigator; Co-Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2053804 |
Many areas of the physical, engineering and biological sciences make extensive use of computer simulators to model complex systems. Whereas these simulators may be able to generate realistic synthetic data, they are often poorly suited for the inverse problem of inferring the underlying scientific mechanisms associated with observed real-world phenomena.
Hence, a recent trend in the sciences has been to fit approximate models to high-fidelity simulators, and then use these approximate models for scientific inference. Inevitably, any downstream analysis will depend on the trustworthiness of the approximate model, the data collected, as well as the design of the simulations. This project will advance statistical methods for understanding complex physical systems by providing improved procedures and new performance measures for simulator-based scientific inference and uncertainty quantification.
Our work will stimulate the development of data-focused collaborations, and training of students across a wide range of scientific areas, further expanding upon our ongoing interdisciplinary research efforts in high-energy physics, atmospheric science, climatology, and astronomy.
Parameter estimation, confidence sets, and hypothesis testing are the hallmarks of statistical inference. Traditional methods to perform such tasks can sometimes not be applied to problems in the physical sciences because of (i) complex data settings, and (ii) the only meaningful model existing as a high-fidelity forward simulator. For example, in high-energy physics, searches of new interactions and particles require hypothesis tests involving simulations of high-dimensional collision events and their interactions with particle detectors; in cosmology, scientists regularly use large N-body simulations to understand how the Universe formed and evolved; and in atmospheric science, inferring land-air carbon fluxes based on satellite observations relies on complex atmospheric transport models.
A key question is whether one can still construct hypothesis tests and confidence sets with proper frequentist coverage and high power when the likelihood function, which connects underlying parameters with observable data, is intractable but one can forward-simulate observable data from an implicit likelihood model. A related question is how to calibrate and assess the performance of surrogate models fit to high-fidelity simulations.
This project works toward designing statistical procedures that unify classical statistics with modern machine learning (e.g., deep generative models, neural network classifiers and convex optimization) via the following aims: (1) Scalable tools and theory for constructing statistical tests and frequentist confidence sets with finite-sample validity in a simulator-based inference setting; (2) Statistically rigorous validation methods, which can quantify and diagnose the quality of fitted models of high-dimensional data with statistical confidence across both feature and parameter space; and (3) Sequential testing strategies that allow us to identify how to best simulate data to improve tests and confidence sets in Aim 1.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Carnegie-Mellon University
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant