Active NON-SBIR/STTR RPGS NIH (US)

Harnessing multimodal data to enhance machine learning of children’s vocalizations

$2M USD

Funder	NATIONAL INSTITUTE ON DEAFNESS AND OTHER COMMUNICATION DISORDERS
Recipient Organization	University of Miami Coral Gables
Country	United States
Start Date	Feb 01, 2021
End Date	Jan 31, 2026
Duration	1,825 days
Number of Grantees	2
Roles	Principal Investigator; Co-Investigator
Data Source	NIH (US)
Grant ID	`10411575`

Grant Description

Project Summary This Administrative Supplement proposes implementation of a multimodal data pipeline to support machine learning of child language production in complex naturalistic environments.

The Supplement builds on the parent R01 (DC018542) that gathers objective, longitudinal data to capture the vocal interactions of children with hearing loss (HL). Even with cochlear implantation, HL is a life-altering condition with high social costs.

Inclusion of children with HL and typically hearing (TH) peers in preschool classrooms is a national standard, but it is not clear how early vocal interaction contributes to the language development of children with HL and their TH peers.

The parent R01 employs computational models of child location and orientation to indicate when children are in social contact with their peers and teachers.

An additional strategy for pursuing the broad goals of the R01? identifying interactive contexts in which children produce phonemically complex vocalizations and interactive speech?is machine learning.

Machine learning algorithms can determine the contextual, individual, and interactive factors that predict children?s vocalizations and vocal interactions.

However, the parent R01 does not propose machine learning, nor are data disseminated in a format designed to facilitate machine learning.

To facilitate machine learning in the classroom, a rigorous diarization process is required to determine speaker identity, which is operationalized as the likelihood that each vocalization was spoken by a given child or teacher.

We will integrate audio processing of each target child and teacher?s first-person audio recording with processing of their interactive partners? recordings.

The influence of partner recordings will be determined by their physical distance and orientation relative to the target. This will yield a weighted speaker identification score for each vocalization.

For 25% of the sample, the algorithmic score will be compared to speaker identification provided by trained coders to quantify intersystem reliability.

Processed datasets will include 7,160 hours of multimodal recordings of child and teacher movement in classrooms synchronized with continuously recorded, child- and teacher-specific (first-person) audio recordings.

De-identified output data will characterize vocalizations with respect to algorithmically computed speaker identification probabilities, coder-identified speaker identity (25% of sample), phonemic complexity and audio characteristics (e.g., fundamental frequency), as well as the position and relative orientation of all individuals in the classroom, and child demographics (including characterizations of HL).

Over the course of the supplement, output data, Python processing code, and metadata descriptions of the processing pipeline will be disseminated in dedicated distribution portals including Github, Kaggle, and the UCI repository. Recordings will be released to certified investigators via NIH-funded repositories such as Databrary and Homebank.

All Grantees

University of Miami Coral Gables

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

Harnessing multimodal data to enhance machine learning of children’s vocalizations

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants