Completed STANDARD GRANT National Science Foundation (US)

CCRI: ENS: Next Generation Tools for Spoken Language Science & Technology

$18.4M USD

Funder	National Science Foundation (US)
Recipient Organization	Johns Hopkins University
Country	United States
Start Date	Oct 01, 2021
End Date	Sep 30, 2025
Duration	1,460 days
Number of Grantees	7
Roles	Former Co-Principal Investigator; Principal Investigator; Former Principal Investigator; Co-Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2120435`

Grant Description

The task of automatic speech recognition (ASR) and spoken language understanding embodies almost all the elements of artificial intelligence (AI). Reliable ASR (when ubiquitously available) will be a key enabler of robust intelligence research in spoken dialog systems for human-computer interactions, information integration research in content-based multimedia search and access to oral history archives, and fundamental speech science and technology to enable research in children's cognitive development, linguistics, smart health, elderly care, education, and (broadly) the machine-aided study of behavioral and social dynamics.

This project, developed after extensive consultations with the speech and language research community, is extensively revising the Kaldi open-source toolkit to (a) make speech recognition more accessible both for beginners in speech recognition and researchers in other fields, (b) leverage existing deep learning framework (primarily PyTorch) to increase its flexibility, (c) create new user training materials, and (d) continue to enhance the toolkit, so as to support the growth of and cooperation within the community.

The project implements all core Kaldi functions (e.g., the lattice-free maximum mutual information training objective) natively in generic AI/deep learning frameworks, primarily PyTorch, so that associated advances in deep learning (e.g., novel optimization algorithms) can be seamlessly leveraged. Furthermore, the project incorporates automatic differentiation through finite state transducers, a core Kaldi feature responsible for its state-of-the-art performance, permitting true end-to-end training of ASR systems.

These and other enhancements will make it possible to achieve two currently incompatible goals: incorporating structure external knowledge (e.g., dialog flow models, finite state grammars, pronunciation lexicons) into fully neural ASR systems, and end-to-end training of a hybrid ASR system via backpropagation. Other goals of this proposal include the provision of efficient yet user-friendly data preparation and model management tools for large scale training of ASR systems, and capabilities for robust conversation analysis and speaker diarization needed by researchers who use ASR as a tool for other scientific inquiries.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Johns Hopkins University

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

CCRI: ENS: Next Generation Tools for Spoken Language Science & Technology

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants