Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | Johns Hopkins University |
| Country | United States |
| Start Date | Oct 01, 2021 |
| End Date | Sep 30, 2025 |
| Duration | 1,460 days |
| Number of Grantees | 7 |
| Roles | Former Co-Principal Investigator; Principal Investigator; Former Principal Investigator; Co-Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2120435 |
The task of automatic speech recognition (ASR) and spoken language understanding embodies almost all the elements of artificial intelligence (AI). Reliable ASR (when ubiquitously available) will be a key enabler of robust intelligence research in spoken dialog systems for human-computer interactions, information integration research in content-based multimedia search and access to oral history archives, and fundamental speech science and technology to enable research in children's cognitive development, linguistics, smart health, elderly care, education, and (broadly) the machine-aided study of behavioral and social dynamics.
This project, developed after extensive consultations with the speech and language research community, is extensively revising the Kaldi open-source toolkit to (a) make speech recognition more accessible both for beginners in speech recognition and researchers in other fields, (b) leverage existing deep learning framework (primarily PyTorch) to increase its flexibility, (c) create new user training materials, and (d) continue to enhance the toolkit, so as to support the growth of and cooperation within the community.
The project implements all core Kaldi functions (e.g., the lattice-free maximum mutual information training objective) natively in generic AI/deep learning frameworks, primarily PyTorch, so that associated advances in deep learning (e.g., novel optimization algorithms) can be seamlessly leveraged. Furthermore, the project incorporates automatic differentiation through finite state transducers, a core Kaldi feature responsible for its state-of-the-art performance, permitting true end-to-end training of ASR systems.
These and other enhancements will make it possible to achieve two currently incompatible goals: incorporating structure external knowledge (e.g., dialog flow models, finite state grammars, pronunciation lexicons) into fully neural ASR systems, and end-to-end training of a hybrid ASR system via backpropagation. Other goals of this proposal include the provision of efficient yet user-friendly data preparation and model management tools for large scale training of ASR systems, and capabilities for robust conversation analysis and speaker diarization needed by researchers who use ASR as a tool for other scientific inquiries.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Johns Hopkins University
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant