Loading…

Loading grant details…

Completed STANDARD GRANT National Science Foundation (US)

Collaborative Research: Use of Random Compression Matrices For Scalable Inference in High Dimensional Structured Regressions

$1.1M USD

Funder National Science Foundation (US)
Recipient Organization University of California-San Francisco
Country United States
Start Date Jun 15, 2022
End Date May 31, 2025
Duration 1,081 days
Number of Grantees 1
Roles Principal Investigator
Data Source National Science Foundation (US)
Grant ID 2210206
Grant Description

As the scientific community moves into a data-driven era, there is an unprecedented opportunity to leverage large scale imaging, genetic and EHR data to better characterize and understand human disease to improve treatment and prognosis. Consequently, analysis of such datasets with flexible statistical models has become an enormously active area of research over the last decade.

To this end, this project plans to develop a completely new class of methods, which are based on the idea of fitting statistical models on datasets obtained by compressing big data using a well designed mechanism. The development enables efficient modeling of massive data on an unprecedented scale. While the motivation of the investigators comes primarily from complex modeling and uncertainty quantification of massive biomedical data, the statistical methods are general enough to set important footprints in the related literature of machine learning and environmental sciences.

The overarching goal also includes the development of software toolkits to better serve practitioners in related disciplines. Further, the projects will provide first hand training opportunities for graduate and undergraduate students, including female and students from minority communities, in state-of-the-art statistical methodologies and imaging/genetic/EHR data.

By disseminating the outcome of the project among high school students in terminology that they can understand, the project can have far reaching effects to enhance public scientific literacy about statistics.

Two crucial aspects of modern statistical learning approaches in the era of complex and high dimensional data are accuracy and scale in inference. Modern data are increasingly complex and high dimensional, involving a large number of variables and large sample size, with complex relationships between different variables. Developing practically efficient (in terms of storage and analysis) and theoretically “optimal” Bayesian high dimensional parametric or nonparametric regression methods to draw accurate inference with valid uncertainties from such complex datasets is an extremely important problem.

To offer a general solution for this problem, the investigators will develop approaches based on data compression using a small number of random linear transformations. The approach either reduces a large number of records corresponding to each variable using compression, in which case it maintains feature interpretation for adequate inference, or, reduces the dimension of the covariate vector for each sample using compression, in which case the focus is only on prediction of the response.

In either case, data compression facilitates drawing storage efficient, scalable and accurate Bayesian inference/prediction in presence of high dimensional data with sufficiently rich parametric and nonparametric regression models. An important goal is to establish precise theoretical results on the convergence behavior of the fitted models with compressed data as a function of the number of predictors, sample size, properties of random linear transformations and features of these models.

The approaches will be used to study neurological disorders by combining brain imaging data, genetic data and electronic health records (EHR) data from the UK Biobank database. The project will also contribute on a broader front to advancing the interdisciplinary research training and broadening participation in statistical sciences.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

University of California-San Francisco

Advertisement
Discover thousands of grant opportunities
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant