Completed STANDARD GRANT National Science Foundation (US)

Minipatch Learning for Selection, Stability, Inference, and Scalability

$1.95M USD

Funder	National Science Foundation (US)
Recipient Organization	Columbia University
Country	United States
Start Date	Feb 01, 2025
End Date	Jul 31, 2025
Duration	180 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2516872`

Grant Description

Massive amounts of data are now collected by nearly every industry and academic discipline. Uncovering the hidden insights in such data holds the key to major scientific challenges such as understanding how the brain works, discovering mechanisms leading to diseases such as cancer and Alzheimer's disease, and combating climate change, among many others.

But discovering key features and important relationships in complex and huge data poses major statistical and computational challenges. The investigator aims to develop new statistical machine learning approaches and theory for this task that break up huge data sets into small random subsets called minipatches to facilitate both faster computation and improved statistical efficiency.

The new methods will be implemented in open-source software and applied to huge biomedical datasets in genomics and neuroscience. The project will provide undergraduate and graduate students training and professional development opportunities.

Discovering key features and important relationships in complex and huge data commonly found in biomedicine poses not only major computational challenges but also critical statistical challenges. To tackle these challenges, the investigator plans to develop a new framework termed minipatch learning. Inspired by the successes of random forests, stability approaches in high-dimensional statistics, and stochastic optimization strategies, the investigator will build ensembles from many random tiny subsets of both observations and features or variables called minipatches.

While ensemble learning strategies are commonly used in supervised machine learning, the investigator will use minipatch learning for the tasks of feature selection, model-agnostic inference for feature importance, and learning relationships amongst features through graphical models. The approach, which trains on very tiny subsets of the data, is expected to have dramatic computational and memory savings.

The investigator aims to show both theoretically and empirically that such a strategy poses significant statistical advantages as well.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Columbia University

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

Minipatch Learning for Selection, Stability, Inference, and Scalability

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants