Active STANDARD GRANT National Science Foundation (US)

CAREER: High-Agreement Crowdsourcing for Difficult Language-Understanding Tasks

$5.5M USD

Funder	National Science Foundation (US)
Recipient Organization	New York University
Country	United States
Start Date	Oct 01, 2021
End Date	Sep 30, 2026
Duration	1,825 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2046556`

Grant Description

When engineers build modern artificial intelligence (AI) systems for language problems like question answering, they use datasets of examples to teach the systems how to solve the problem, rather than programming the systems directly. These datasets of examples are often collected through crowd work, where a large population of non-specialists are hired to come up with example answers to questions, example summaries of documents, or the like.

Having a diverse group of people provide data for pay is meant to make it possible to build specialized language technology systems quickly, and to ensure that they can cover a wide range of styles of language, but this has not always worked well in practice: Crowd work is often set up in a way that forces participants to work quickly and sloppily, and produces data that’s ineffective at teaching machines to do what we want. This award supports research that aims to fix this, by developing and evaluating best practices for crowd worker training, feedback, and bonus pay to help crowd-worker dataset creators develop professional skills and produce better data that will lead to truly effective language technologies.

The project award will also support parallel efforts at training new scientists and engineers, including programming targeting advanced technical students and outreach events targeting newcomers to the field.

Technically, the project will establish a scientifically-grounded set of practices for crowdsourced data collection for natural language understanding tasks like reading comprehension question answering, coreference resolution, and natural language inference, with a focus on methods that can ensure that the resulting data is diverse, challenging, and high-quality in the face of obstacles posed by subjectivity and legitimate annotator disagreements. The main experiments to isolate the effect of several novel techniques for data collection, covering the training, feedback, and incentive structures used in crowdsourced data collection.

A complementary thread will evaluate and refine task designs with the goal of identifying the task formulations that best isolate and reinforce model abilities to understand and reason with texts, informed by large experimental surveys of existing tasks. The accompanying education program will scale up processes for research mentorship to reach a larger fraction of the diverse and qualified undergraduate and graduate student population at New York University, both through seminars and taught research methods courses.

The accompanying outreach plan will support the development of a recurring workshop series for early-year undergraduates tentatively interested in careers in AI and language technology, recruiting especially from groups underrepresented in computing.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

New York University

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

CAREER: High-Agreement Crowdsourcing for Difficult Language-Understanding Tasks

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants