Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | University of Illinois At Urbana-Champaign |
| Country | United States |
| Start Date | May 01, 2021 |
| End Date | Aug 31, 2025 |
| Duration | 1,583 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2131335 |
Stochastic algorithms such as stochastic gradient descent (SGD) are the workhorse of modern data science. Such algorithms have been playing an important role in the success of deep learning. In spite of such empirical success, the behavior of SGD for challenging non-convex optimization problems as encountered in deep learning is shrouded in mystery.
There is limited understanding of how SGD navigates non-convex loss landscapes, how bad local minima are avoided, and how deep models learned using SGD generalize well on future data. The project focuses on gaining clarity of understanding of SGD dynamics and generalization for non-convex problems arising in the context of deep learning. The project also uses the improved understanding to develop prinipled approches to adaptively use validation sets to choose hyper-parameters and avoid overfitting.
The insights gained from the technical advances are applied to the challenging scientific problem of sub-seasonal to seasonal (S2S) weather forecasting, which focuses on forecasting weather on a few weeks to few months time-frame. Advances in S2S forecasting is critically important to a wide variety of application domains including water resource management, agriculture, energy, aviation, maritime planning, and emergency planning.
The project also engages the broader data science community, incorporating the gained insights for curricular enrichment, and broadening participation from underepresented groups.
The project studys SGD dynamics with primary focus on the over-parameterized setting, i.e., where the number of samples is smaller than the number of parameters, which is typical for deep learning. The dynamics is carefully studied based on two key matrices: the Hessian of the non-convex loss function and the covariance matrix of the stochastic gradients, their eigen-spectra, and the overlap between their principal subspaces.
Although the SGD dynamics happen in a high-dimensional space, the principal subspaces of these matrices can be low-dimensional. Tools from high-dimensional geometry and associated stochastic processes are utilized to characterize such low dimensional dynamics in high-dimensional spaces. Principled approaches to explain the intriguing generalization behavior of deep learning models trained with SGD are also developed based on the properties of these matrices.
Further, differential privacy based mechanisms are developed for adaptively using validation sets for choosing hyper-parameters and avoiding over-fitting in deep learning.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
University of Illinois At Urbana-Champaign
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant