Loading…

Loading grant details…

Completed STANDARD GRANT National Science Foundation (US)

RR: CompCog: A challenge suite for statistical word segmentation

$2.57M USD

Funder National Science Foundation (US)
Recipient Organization Mgh Institute of Health Professions
Country United States
Start Date Oct 01, 2024
End Date Aug 31, 2025
Duration 334 days
Number of Grantees 1
Roles Principal Investigator
Data Source National Science Foundation (US)
Grant ID 2435735
Grant Description

A central scientific puzzle is how children manage to acquire language despite limited and inconsistent explicit feedback. Numerous mathematical results seem to suggest that acquiring a language should be impossible; the fact that children do it every day reveals a deep gap in the science of learning. Some research suggests that children make considerable headway by detecting patterns in what they hear even without any explicit teaching or even knowing what is being talked about ("statistical" or "unsupervised" learning).

Indeed, much of the recent progress in "teaching" computers to understand language has made use of just this strategy. Even more compelling: numerous experiments have shown that both adults and infants are able to learn at least a little bit about language this way. How much they can learn remains unclear.

A central difficulty is that mathematically, there are many different methods for pattern-detection and it is unclear which one(s) humans use. This is important because some work better than others; and whether unsupervised pattern-detection can help solve the mystery of language learning depends on which method is used. The purpose of this project is to put together a "challenge suite": a dataset that can be used to systematically evaluate and compare the possibilities.

Such challenge suites have been instrumental in advancing artificial intelligence. This project also serves as a proof-of-concept to determine whether challenge suites are similarly beneficial for the science of learning, and at the same time provide valuable resources and training to the research community.

To develop the challenge suite, the investigators will first conduct a comprehensive, quantitative literature review (meta-analysis) focusing on the largest body of work on unsupervised pattern-detection: adult statistical word segmentation. Aided by outside experimenters, the meta-analysis will be used to identify 10-15 key experiments. As a group, these experiments will establish a basic set of facts about adult statistical word segmentation that any theory must account for.

For these reasons, the project will focus particularly on theoretically-central phenomena that distinguish different theories. To measure different aspects of linguistic pattern-detection, each experiment will involve large numbers of subjects (approx. 1,200 each) and a subset of 3-5 experiments with an even larger number (approx. 24,000 each). A tool will be developed to enable researchers to compare any mathematical theory of learning against these data, determining how well it matches human performance.

In order to determine how the mathematical theory could learn language, a database of transcripts of child-directed speech in 3-5 languages will be developed. Each theory will also be tested/trained on the database to see how much it could learn about those languages. The challenge suite will be made available to all researchers as a download and also through a website where researchers can submit their models and compare results against those of other models.

This work will be publicized to the scientific community through a closing workshop focused on models of unsupervised word segmentation.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Mgh Institute of Health Professions

Advertisement
Discover thousands of grant opportunities
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant