Loading…

Loading grant details…

Completed STANDARD GRANT National Science Foundation (US)

Minipatch Learning for Selection, Stability, Inference, and Scalability

$1.95M USD

Funder National Science Foundation (US)
Recipient Organization Columbia University
Country United States
Start Date Feb 01, 2025
End Date Jul 31, 2025
Duration 180 days
Number of Grantees 1
Roles Principal Investigator
Data Source National Science Foundation (US)
Grant ID 2516872
Grant Description

Massive amounts of data are now collected by nearly every industry and academic discipline. Uncovering the hidden insights in such data holds the key to major scientific challenges such as understanding how the brain works, discovering mechanisms leading to diseases such as cancer and Alzheimer's disease, and combating climate change, among many others.

But discovering key features and important relationships in complex and huge data poses major statistical and computational challenges. The investigator aims to develop new statistical machine learning approaches and theory for this task that break up huge data sets into small random subsets called minipatches to facilitate both faster computation and improved statistical efficiency.

The new methods will be implemented in open-source software and applied to huge biomedical datasets in genomics and neuroscience. The project will provide undergraduate and graduate students training and professional development opportunities.

Discovering key features and important relationships in complex and huge data commonly found in biomedicine poses not only major computational challenges but also critical statistical challenges. To tackle these challenges, the investigator plans to develop a new framework termed minipatch learning. Inspired by the successes of random forests, stability approaches in high-dimensional statistics, and stochastic optimization strategies, the investigator will build ensembles from many random tiny subsets of both observations and features or variables called minipatches.

While ensemble learning strategies are commonly used in supervised machine learning, the investigator will use minipatch learning for the tasks of feature selection, model-agnostic inference for feature importance, and learning relationships amongst features through graphical models. The approach, which trains on very tiny subsets of the data, is expected to have dramatic computational and memory savings.

The investigator aims to show both theoretically and empirically that such a strategy poses significant statistical advantages as well.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Columbia University

Advertisement
Apply for grants with GrantFunds
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant