Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | Indiana University |
| Country | United States |
| Start Date | Apr 01, 2021 |
| End Date | Dec 31, 2024 |
| Duration | 1,370 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2046963 |
Current general-purpose speech enhancement systems employ large models trained from big datasets of audio signals which are too bulky to run on small personal devices. A personalized model can be a resource-efficient solution because it focuses on a particular user and a specific test environment for which a smaller model architecture can be good enough.
However, training a personalized model requires clean voice data from the test-time user in advance, which are not always available because of the user’s privacy concerns or problems with recording. This CAREER project develops machine-learning methods to achieve the personalization goal while requiring no or few data samples from the test-time users.
Because the project achieves the personalization goal in a privacy-preserving and resource-efficient way, it is a step towards a more available and affordable use of artificial intelligence for all members of society.
The project circumvents the lack of personal data in the context of personalized speech enhancement using no- and few-shot learning frameworks with help from adversarial and self-supervised learning. First, it verifies that a personalized system with reduced computational complexity can still compete with a generic model in speech enhancement performance.
To this end, the training algorithm divides the potentially large model into multiple sub-modules, each of which handles a particular sub-problem (e.g., a particular user's utterance). If the sub-problems are defined to be mutually exclusive, the test-time inference can be made efficiently by using only the most suitable sub-module. Since the sub-module selection is done on noisy speech, it achieves personalization with no additional training on the test user's data.
Second, the project explores a no-shot learning approach, in which the fundamental challenge lies in optimizing a machine learning model with no available target. To this end, an already-trained general-purpose model is fine-tuned for an unseen test environment using adversarial optimization. The third research topic handles the case when a small amount of user's clean speech is available, which falls in the category of few-shot learning.
The project overcomes data shortage via a self-supervised learning method that learns effective features from noisy speech data, which are more available than the clean ones. That way, the model can be prepared for a subsequent fine-tuning step, which can be done with only a few clean user-specific speech utterances.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Indiana University
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant