Active CONTINUING GRANT National Science Foundation (US)

CAREER: From One Language to Another

$5.02M USD

Funder	National Science Foundation (US)
Recipient Organization	University of Colorado At Boulder
Country	United States
Start Date	May 15, 2021
End Date	May 31, 2026
Duration	1,842 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2149404`

Grant Description

Language technology has become an integral part of how we interact with the world of information, but sophisticated natural language processing (NLP) tools are available only for a handful of the approximately 7000 languages spoken across the world. Modern data-driven methods for developing NLP tools generally rely on the availability of enormous amounts of data for the language in question, an obstacle that may be insurmountable for many languages, especially languages lacking significant digital resources and languages with small or diminishing numbers of speakers.

This project aims to remove barriers to developing NLP tools for languages with less data, developing new methods that incorporate knowledge about linguistic properties of languages into models learned from data. Learning how to build faster paths to NLP tools for new languages has the potential to rapidly advance the state of language technology for any language.

In addition, the tools and knowledge developed here have the potential to speed up the description of endangered languages, helping to secure an informed record of the world's languages while there are still speakers to learn from.

The imbalance in access to language technologies arises in part because current NLP models and algorithms need to learn from large amounts of training data. This project addresses that imbalance by adapting methods from cross-lingual transfer learning, in which models learned on one language are adapted and exploited to make predictions for another language.

One innovation of this project is to investigate the incorporation of expert linguistic knowledge for improving model transfer. Two types of linguistic knowledge will be injected into artificial neural network models for morphological analysis and part-of-speech tagging: a) knowledge about relationships between individual languages and language families; and b) knowledge about specific linguistic properties of individual languages and language families.

The models will be evaluated both intrinsically and extrinsically, the latter by studying the usefulness of the models for human linguistic analysis and as part of the language documentation and description workflow.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

University of Colorado At Boulder

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

CAREER: From One Language to Another

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants