Loading…

Loading grant details…

Completed STANDARD GRANT National Science Foundation (US)

CIF:Small: Theory and Methods for Simultaneous Feature Auto-grouping and Dimension Reduction in Supervised Multivariate Learning

$3.4M USD

Funder National Science Foundation (US)
Recipient Organization Florida State University
Country United States
Start Date Jun 01, 2021
End Date May 31, 2025
Duration 1,460 days
Number of Grantees 1
Roles Principal Investigator
Data Source National Science Foundation (US)
Grant ID 2105818
Grant Description

Modern real-world applications have created an urgent need for analyzing and interpreting high-dimensional data with low-dimensional structures. In situations where a large number of response variables is present, very few features may be completely irrelevant to the entire set of responses; this leads to ineffective sparsity-based variable selection and to non-interpretable vanilla low-rank modeling.

To address these issues, this project proposes grouping the features based on their contributions to the response variables, in a possibly low-dimensional subspace, in order to build a more parsimonious and interpretable model. In the context of multivariate learning, the intrinsic cost of searching for clusters and the potential adverse effect of high-dimensionality on signal recovery are not yet fully understood.

Another critical challenge in the big-data era is to develop efficient optimization algorithms with rigorous convergence guarantees. The fact that the obtained algorithmic solutions may not be globally optimal, due to the non-convexity of the problem, makes the statistical error analysis nontrivial. The associated model-selection problem is another unsolved problem in the context of clustering, most notably when the number of features and/or the number of responses go beyond the sample size.

To answer these questions, innovative and transformative statistical methods are being introduced, and the proposed algorithms are being analyzed to demonstrate their efficiency. The project covers potential applications in a wide range of areas such as machine learning, genomics, and macro-econometrics, and will help cross-fertilize ideas from statistics, operations research, economics, and bio-engineering.

Education activities are tightly coupled with research, and include course development, student mentoring, outreach, and recruiting underrepresented students.

The project proposes a novel clustered reduced-rank learning framework that utilizes joint matrix regularizations to relax the stringent assumption of sparsity-based learning and to gain interpretability as compared with vanilla low-rank modeling. Some universal information-theoretic limits are revealing the intrinsic cost of searching for clusters regardless of the estimator in use, as well as the benefit of accumulating a large number of response variables in multivariate learning.

Efficient optimization algorithm that perform simultaneous subspace learning and clustering are being developed; the resulting fixed-point estimators, while not necessarily globally optimal, still enjoy the desired statistical accuracy beyond the standard likelihood setup. Finally, a new kind of information criterion for joint cluster and rank selection is being proposed, without assuming either infinite sample size or large signal-to-noise ratio.

The research is creating a fusion between statistics, information theory, nonconvex optimization, and model selection, with real-world applications.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Florida State University

Advertisement
Discover thousands of grant opportunities
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant