Active CONTINUING GRANT National Science Foundation (US)

CAREER: What is in a Voice?: Scientific and Machine Learning Advancement for Voice Conversion

$2.94M USD

Funder	National Science Foundation (US)
Recipient Organization	Johns Hopkins University
Country	United States
Start Date	Jan 01, 2025
End Date	May 31, 2029
Duration	1,611 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2533652`

Grant Description

Prior research and applications of voice conversion models have raised challenging problems that are both theoretical and use-inspired. Notable challenges include processing emotional speech and speech in noisy environments and generating speech that represents the characteristics and expressiveness of specific speakers such as personality traits, mood, prosody, and emotional state.

These challenges are exacerbated by a limited availability of data. Improving such capabilities will have a wide range of social impacts ranging from giving natural voice to patients who have lost it to rendering comprehensible and speaker faithful renderings of old poor quality recordings that have become hard to understand to generating seamless speech translations in real time communications while staying faithful to the voice characteristics of the speaker.

To address these challenges, the project proposes to explore and expand theories about speaker identity, emotion, and expressiveness in challenging conditions. Practically, this means studying how factors like background noise, emotions such as stress, cultural differences or other idiosyncratic ways of speaking affect a system’s ability to recognize and render faithfully the speech of a specific individual.

This work will enable a second aim of this project which is to create voice technology that can be used for safeguarding ethical and responsible use of voice generation. Sophisticated voice conversion techniques can be used to detect and prevent spoofing and other fraudulent activities and make it challenging for unauthorized users to mimic or imitate target speakers.

Besides security and defense other areas that will benefit from this project include security and defense, accessibility and healthcare assistive technologies, medical voice preservation, speech therapy and rehabilitations as well as entertainment and gaming.

This award aims to develop novel algorithms utilizing deep learning techniques to advance voice conversion models with the ability to represent faithfully the characteristics and emotional states of individual speech. The project includes the following key areas of research. The first research target is to explore learning speaker identity and emotion representations for robust voice conversion with self-supervision.

By investigating joint representations, this project seeks to develop a deeper understanding of how speaker characteristics and emotions can be effectively transformed. The second research target is to investigate voice conversion solutions for challenging conditions such as noisy environment, emotional speakers, and limited training to enhance the expressiveness and naturalness of the converted speech.

The third research target is to investigate novel deep learning techniques for the detection of synthetic voices and joint training strategies to further improve voice conversion performance and evaluation. By exploring the synergies between transformation and detection of synthetic voices, this project has the potential to significantly impact society with a) accurate and expressive voice-based applications and b) applying the same techniques to detect when speech is naturally occurring or synthetic for the prevention of spoofing.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Johns Hopkins University

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

CAREER: What is in a Voice?: Scientific and Machine Learning Advancement for Voice Conversion

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants