Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | University of Washington |
| Country | United States |
| Start Date | Oct 01, 2021 |
| End Date | Dec 31, 2024 |
| Duration | 1,187 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2203097 |
Human language technology has recently matured to the extent that computational systems can generally interact with users in ways that are natural to humans, not just to machines. However, most people in the world today are multilingual, and current approaches to language technology do not reflect the reality that multilingual communication is ubiquitous; that is, current technology can interact naturally with monolingual speakers, but not with multilingual ones.
Computational systems should be able to generate language that sounds equally natural to these users, and this includes being able to accommodate nonnative speakers. This project first creates a large-scale, broad coverage dataset, reflecting conversations between humans and an automatic system that is sophisticated enough to generate fluent multilingual (i.e. 'code-switched') utterances, but is simple enough for controlled experiments.
The dataset is far larger than ones that are currently available, and is based on a much more detailed understanding of language-switching strategies. Second, this dataset is used to develop new methods to incorporate code-switching into contemporary deep-learning language generation, including dialogue systems, question answering, assistive technologies, summarization and machine translation.
This innovation should benefit a dramatic number of multilingual computer users, including less privileged users who are currently required to interact with machines in a language they do not speak fluently. Successful completion of the research program will pave the way for the development of natural language technologies that are more accommodating to such users, building bridges over the digital divide.
The overarching goal of this project is to develop multilingual and contextualized language generation technologies that are more controllable and more adaptable to multilingual users. The project achieves this goal by completing the following objectives. (1) It develops psycholinguistically-grounded, scalable approaches to collecting corpora for studying how multilingual speakers adapt to each other's linguistic choices in text conversations.
These methodologies are employed to collect large-scale, rich datasets of multilingual human-machine conversations. These datasets, as well as additional corpora of human code-switched interactions, should shed new light on the theoretical understanding of cross-lingual usage patterns, allowing for better understanding of how people employ code-switching in written language. (2) It uses the linguistic insights obtained through this endeavor to define classifiers that predict code-switching. (3) Novel approaches are developed for efficient, large-vocabulary neural language generation that incorporate these classifiers, allowing generation systems to introduce code-switching in a way that sounds natural to multilingual users.
Consequently, this project should dramatically advance our understanding of code-switching, especially in the relatively unexplored territory of written dialogue. In addition, its contributions benefit a broad range of applications that rely on language generation, including dialogue systems, question answering, assistive technologies, summarization and machine translation.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
University of Washington
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant