Completed STANDARD GRANT National Science Foundation (US)

NSF-BSF: RI: Small: Efficient Transformers via Formal and Empirical Analysis

$5.2M USD

Funder	National Science Foundation (US)
Recipient Organization	University of Washington
Country	United States
Start Date	Oct 01, 2021
End Date	Sep 30, 2025
Duration	1,460 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2113530`

Grant Description

Building a state-of-the-art natural language processing (NLP) system costs millions of dollars, because it requires training a neural network architecture with hundreds of billions of parameters (1000 times more in 2021 than just three years ago) on billions of text documents. At present, a single architecture, the "transformer", is used in question answering, summarization, machine translation, analysis and generation systems, text classification, and virtually every other NLP research system.

A significant reduction in the transformer's costs will lower barriers to participation in research for the vast majority of research groups around the world and reduce the environmental footprint of NLP research. Principled methods for reducing that cost are also expected to transfer readily to the generations of models that will, inevitably, replace the transformer.

This project begins with a randomized approximation to the standard attention function that reduces runtime and memory requirements of the transformer from quadratic to linear (in the input length). To this randomized approach, the lens of "rational models" is applied. Rational models have offered a unifying view of earlier generations of neural models popular in NLP (convolutional and recurrent networks) and gave rise to computational efficiency and interpretability gains.

A second research direction focuses on the efficiency of gradient-based training algorithms. Empirical evidence has shown that neural network learning proceeds in two phases: a fast phase that is sensitive to hyperparameters and then a slow one that is more robust. This project establishes the extent to which the pattern holds with current NLP models and then seeks to exploit the pattern to speed up the second stage.

Both directions will make the transformer architecture more efficient and significantly reduce its financial and environmental costs, and potentially do the same for future neural network architectures. The project's implementations will be made available as open-source software with friendly licenses permitting wide adoption.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

University of Washington

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

NSF-BSF: RI: Small: Efficient Transformers via Formal and Empirical Analysis

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants