Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | University of Washington |
| Country | United States |
| Start Date | Oct 01, 2021 |
| End Date | Sep 30, 2025 |
| Duration | 1,460 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2113530 |
Building a state-of-the-art natural language processing (NLP) system costs millions of dollars, because it requires training a neural network architecture with hundreds of billions of parameters (1000 times more in 2021 than just three years ago) on billions of text documents. At present, a single architecture, the "transformer", is used in question answering, summarization, machine translation, analysis and generation systems, text classification, and virtually every other NLP research system.
A significant reduction in the transformer's costs will lower barriers to participation in research for the vast majority of research groups around the world and reduce the environmental footprint of NLP research. Principled methods for reducing that cost are also expected to transfer readily to the generations of models that will, inevitably, replace the transformer.
This project begins with a randomized approximation to the standard attention function that reduces runtime and memory requirements of the transformer from quadratic to linear (in the input length). To this randomized approach, the lens of "rational models" is applied. Rational models have offered a unifying view of earlier generations of neural models popular in NLP (convolutional and recurrent networks) and gave rise to computational efficiency and interpretability gains.
A second research direction focuses on the efficiency of gradient-based training algorithms. Empirical evidence has shown that neural network learning proceeds in two phases: a fast phase that is sensitive to hyperparameters and then a slow one that is more robust. This project establishes the extent to which the pattern holds with current NLP models and then seeks to exploit the pattern to speed up the second stage.
Both directions will make the transformer architecture more efficient and significantly reduce its financial and environmental costs, and potentially do the same for future neural network architectures. The project's implementations will be made available as open-source software with friendly licenses permitting wide adoption.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
University of Washington
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant