Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | Columbia University |
| Country | United States |
| Start Date | Feb 01, 2025 |
| End Date | Jul 31, 2025 |
| Duration | 180 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2516872 |
Massive amounts of data are now collected by nearly every industry and academic discipline. Uncovering the hidden insights in such data holds the key to major scientific challenges such as understanding how the brain works, discovering mechanisms leading to diseases such as cancer and Alzheimer's disease, and combating climate change, among many others.
But discovering key features and important relationships in complex and huge data poses major statistical and computational challenges. The investigator aims to develop new statistical machine learning approaches and theory for this task that break up huge data sets into small random subsets called minipatches to facilitate both faster computation and improved statistical efficiency.
The new methods will be implemented in open-source software and applied to huge biomedical datasets in genomics and neuroscience. The project will provide undergraduate and graduate students training and professional development opportunities.
Discovering key features and important relationships in complex and huge data commonly found in biomedicine poses not only major computational challenges but also critical statistical challenges. To tackle these challenges, the investigator plans to develop a new framework termed minipatch learning. Inspired by the successes of random forests, stability approaches in high-dimensional statistics, and stochastic optimization strategies, the investigator will build ensembles from many random tiny subsets of both observations and features or variables called minipatches.
While ensemble learning strategies are commonly used in supervised machine learning, the investigator will use minipatch learning for the tasks of feature selection, model-agnostic inference for feature importance, and learning relationships amongst features through graphical models. The approach, which trains on very tiny subsets of the data, is expected to have dramatic computational and memory savings.
The investigator aims to show both theoretically and empirically that such a strategy poses significant statistical advantages as well.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Columbia University
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant