Active STANDARD GRANT National Science Foundation (US)

Enhancing research on speech and deep learning through holistic acoustic analysis

$10M USD

Funder	National Science Foundation (US)
Recipient Organization	Northwestern University
Country	United States
Start Date	Aug 15, 2022
End Date	Jul 31, 2026
Duration	1,446 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2219843`

Grant Description

You can guess a lot about a person from the way they pronounce words. Remarkably, human listeners can tell if it is likely that talkers learned English as a first language or a second language, or if the talkers might have a brain injury that makes it difficult for them to speak. Such intuitions rely on human listeners’ holistic pattern recognition abilities; these allow us to perceive the important, meaningful, yet subtle differences between pronunciations.

However, the methods scientists currently use to measure speech objectively – based on a small number of properties of speech sounds – fail to capture these differences, hampering our ability to use speech to learn about the mind and brain. This project brings together speech scientists, computer scientists, and neuroscientists to test a radically different approach to this problem.

Machine learning will be used to discover a new method for quantifying differences between spoken utterances based on holistic pattern recognition. This will be tested against new and existing data from bilingual speakers. If successful, this will yield a fully general method that can be applied to speech from any language or any domain of language usage, allowing scientists to capitalize on the wealth of information in speech to develop powerful new insights into the mind and brain.

Improved detection of subtle problems with pronunciation, such as occurs with Alzheimer’s disease, will advance our understanding of the brain mechanisms that humans use to produce speech. The results of this testing will also allow computer scientists to advance our understanding of how machine learning algorithms process sounds, driving improvements in the algorithms and supporting applications in any area of speech and language technology that relies on spoken language processing.

Speech variability across talkers provides a treasure trove of information for cognitive neuroscientists, leading to important insights into the cognitive mechanisms underlying language processing and potentially providing early signs of brain dysfunction. Current studies of speech are hamstrung by analyses that require preselecting specific temporal scales and acoustic dimensions.

We propose a radically different approach: using unsupervised deep learning to discover a representational space for analysis of acoustic variation. To test this highly general approach, this method will be compared to current state-of-the art methods for analyzing individual variation in bilingual speech. This includes using the acoustic variation in second language speech to predict intelligibility and to detect difficulties in code-switching, particularly the challenges faced by individuals with Alzheimer’s Disease.

The results will inform development of deep learning and cognitive neuroscience. The machine learning algorithm is fully general; it can be applied to speech from any language or any domain of language usage, expanding the range of populations and contexts that can be served by speech technology or studied by cognitive neuroscientists. The project’s integrative approach will allow computer scientists to advance our understanding of the extent to which modern deep learning architectures do or do not approximate human speech processing and allow cognitive neuroscientists to further our understanding of how meaningful acoustic distinctions are represented in speech perception and production.

human speech representation.

This project is funded by the Integrative Strategies for Understanding Neural and Cognitive Systems (NCS) program, which is jointly supported by the Directorates for Computer and Information Science and Engineering (CISE), Education and Human Resources (EHR), Engineering (ENG), and Social, Behavioral, and Economic Sciences (SBE).

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Northwestern University

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

Enhancing research on speech and deep learning through holistic acoustic analysis

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants