Loading…

Loading grant details…

Active CONTINUING GRANT National Science Foundation (US)

CAREER: High-Dimensional Learning and Inference from Heterogeneous Data Sources

$887.9K USD

Funder National Science Foundation (US)
Recipient Organization Harvard University
Country United States
Start Date Jul 01, 2025
End Date Jun 30, 2030
Duration 1,825 days
Number of Grantees 1
Roles Principal Investigator
Data Source National Science Foundation (US)
Grant ID 2440824
Grant Description

This project will develop novel statistical theories and methods for handling large, heterogeneous datasets. Modern scientific applications often produce heterogeneous data of different types for the same problem. For instance, a single-cell biologist may observe multiple types of sequencing data from diverse instruments, all relevant for understanding the biological pathways of a single complex disease.

The challenge lies in effectively combining these different data types to build statistical pipelines that outperform those developed using any one data type. Traditional statistical approaches struggle with this challenge. This project will establish a new statistical paradigm to address the complexities of such heterogeneous data while accounting for datasets with billions of variables.

The project outcomes will facilitate principled prediction and inference in applications ranging from single-cell biology to precision health and neuroimaging. The project will involve graduate student participation and the development of new curricula at graduate and undergraduate levels that incorporate the project outcomes. Additionally, the research will engage medical professionals to facilitate the dissemination of the research products in current biomedical practice.

This project will develop a modern statistical framework to address data heterogeneity in high dimensions, focusing on three key sub-themes: (i) creating principled and robust prediction strategies for multi-view learning, (ii) developing new inference pipelines and prediction analysis frameworks for meta-learning, and (iii) introducing novel inference methods for low-dimensional functionals under transfer learning. In multi-view learning, this project will quantify optimal strategies for cooperative learning, devise new adversarial learning techniques, and analyze the effects of interpolation learning.

In meta-learning, this project will introduce new debiasing strategies to tackle inference questions that arise during fine-tuning following an initial phase of pre-training. In transfer learning, this project will develop general-purpose strategies for ranking source distributions and establish new inference schemes for low-dimensional functionals of scientific relevance.

On the technical front, this project will introduce novel comparison inequalities, algorithmic proof methods, and leave-one-out techniques that effectively capture the interplay between high dimensionality and heterogeneity.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Harvard University

Advertisement
Discover thousands of grant opportunities
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant