Completed STANDARD GRANT National Science Foundation (US)

NSF-BSF: Collaborative Research: RI: Small: Multilingual Language Generation via Understanding of Code Switching

$2.48M USD

Funder	National Science Foundation (US)
Recipient Organization	University of Washington
Country	United States
Start Date	Oct 01, 2021
End Date	Dec 31, 2024
Duration	1,187 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2203097`

Grant Description

Human language technology has recently matured to the extent that computational systems can generally interact with users in ways that are natural to humans, not just to machines. However, most people in the world today are multilingual, and current approaches to language technology do not reflect the reality that multilingual communication is ubiquitous; that is, current technology can interact naturally with monolingual speakers, but not with multilingual ones.

Computational systems should be able to generate language that sounds equally natural to these users, and this includes being able to accommodate nonnative speakers. This project first creates a large-scale, broad coverage dataset, reflecting conversations between humans and an automatic system that is sophisticated enough to generate fluent multilingual (i.e. 'code-switched') utterances, but is simple enough for controlled experiments.

The dataset is far larger than ones that are currently available, and is based on a much more detailed understanding of language-switching strategies. Second, this dataset is used to develop new methods to incorporate code-switching into contemporary deep-learning language generation, including dialogue systems, question answering, assistive technologies, summarization and machine translation.

This innovation should benefit a dramatic number of multilingual computer users, including less privileged users who are currently required to interact with machines in a language they do not speak fluently. Successful completion of the research program will pave the way for the development of natural language technologies that are more accommodating to such users, building bridges over the digital divide.

The overarching goal of this project is to develop multilingual and contextualized language generation technologies that are more controllable and more adaptable to multilingual users. The project achieves this goal by completing the following objectives. (1) It develops psycholinguistically-grounded, scalable approaches to collecting corpora for studying how multilingual speakers adapt to each other's linguistic choices in text conversations.

These methodologies are employed to collect large-scale, rich datasets of multilingual human-machine conversations. These datasets, as well as additional corpora of human code-switched interactions, should shed new light on the theoretical understanding of cross-lingual usage patterns, allowing for better understanding of how people employ code-switching in written language. (2) It uses the linguistic insights obtained through this endeavor to define classifiers that predict code-switching. (3) Novel approaches are developed for efficient, large-vocabulary neural language generation that incorporate these classifiers, allowing generation systems to introduce code-switching in a way that sounds natural to multilingual users.

Consequently, this project should dramatically advance our understanding of code-switching, especially in the relatively unexplored territory of written dialogue. In addition, its contributions benefit a broad range of applications that rely on language generation, including dialogue systems, question answering, assistive technologies, summarization and machine translation.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

University of Washington

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

NSF-BSF: Collaborative Research: RI: Small: Multilingual Language Generation via Understanding of Code Switching

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants