Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | New York University |
| Country | United States |
| Start Date | Mar 15, 2021 |
| End Date | Feb 28, 2026 |
| Duration | 1,811 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2041872 |
Deep learning (DL) is a major driving force of tech industry, where it is used for a plethora of problems such as image, speech, and video recognition, image segmentation, and natural language processing. DL is also increasingly more often used in physics, medicine, and chemistry, among other disciplines. Training a DL model in any of these applications requires solving a mathematical problem whose properties are poorly understood.
Consequently, existing DL training methodologies are sub-optimal and consume a large amount of resources, time, and money. Our limited understanding of DL compromises the progress of all public and private sectors that rely on DL technology, and limits its deployment in new applications. This project aims at overcoming this limitation by describing universal properties of DL systems that hold across a variety of DL models and data sets.
The acquired knowledge will be used to develop a new generation of training strategies that are tailored to the DL setting and are efficient, accurate, and scalable. New algorithmic tools will have a strong impact on a wide range of applications and can be leveraged by US public and private entities to shift to significantly more powerful computational learning platforms in various areas of their AI-based businesses that require the processing of large and complex data.
Broader impact activities of this project include (a) graduate and undergraduate curriculum development, (b) summer research opportunities for high-school students via NYU Applied Research Innovations in Science and Engineering program, and (c) knowledge popularization via NYU Tandon ECE Seminar Series on Modern AI, organized by the investigator, that is open to universities, high schools, and industry, and that is streamed worldwide.
The proposed research is a multi-level approach to explore the principles of DL optimization and generalization, and to develop new generation DL optimization tools. First, the researchers will seek to understand the relationship between the geometric properties of the non-convex DL loss landscape and the generalization abilities of DL models. Next, the researchers will characterize the training trajectories of common DL optimizers.
These studies will be essential for developing landscape-aware DL optimizers. The obtained optimizers will be parallelized in order to be compatible with the architecture of the computer clusters that are typically used to train large-scale DL networks on massive data. The new parallel optimizers will accommodate dynamic allocation of computational resources during training and will be able to process extremely large data batches.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
New York University
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant