Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | San Francisco State University |
| Country | United States |
| Start Date | Jul 01, 2024 |
| End Date | Jun 30, 2026 |
| Duration | 729 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2347727 |
The goal of this Engineering Research Initiation (ERI) project is to improve the independence, accessibility, and overall safety and well-being of visually impaired persons (VIPs) by developing an AI-based sensory augmentation system that can leverage the power of natural representative sound and technology. This research hypothesizes that using representable synchronous sound extracted from visual data will have a more significant impact on augmenting visual perception than usual auditory cues/captioning.
This research will enhance spatial cognition and streamline navigation, increase personal freedom and empowerment, and ensure the well-being and health of visually impaired individuals. The project will address challenges related to analyzing complex scenes with multiple sound sources and generating representable synchronous sound through audio-visual synchronization with moving objects.
This will provide content-wise relevant, augmented sound alerts of the most immediate events to users. This automatic generation of representable sound for critical situations will have a significant impact on VIPs to comprehend and deal with real-world events. Additional deliverables of this project will include engineering research experiences for students from a wide spectrum of backgrounds, including under-represented minority groups.
The research will develop a novel AI-driven sensory augmentation framework for human sensory augmentation that can leverage the potential of semantic representative sound and alleviate the limitation of considering subsequent motion changes in a visual scene. The design will incorporate visual-to-sound-synthesizing AI model design through an efficient visual action recognition approach, followed by a deep sound generative model facilitated by cloud-based continuous learning concepts.
The project includes an empirical study to analyze the impact of natural sound over vocal auditory cues for the visually impaired person (VIP). This study will contribute to assessing the effect of natural auditory guidance on VIPs towards assisting them in understanding the layout of their surroundings and facilitating spatial orientation and mobility.
Consequently, it will help analyze how interacting with objects/scenarios in the environment significantly enhances the independence, accessibility, and overall well-being of visually impaired people while using the aid. This research has broad societal impacts by establishing a path towards expanding multimodal AI research in multiple directions. The proposed framework can contribute to developing cyber-physical/IoT systems for human sensory augmentation using advanced AI techniques.
The proposed visual-to-audio synthesis model has the potential to impact diverse applied fields, including cyber-physical system design in safety and security applications; multimodal signal processing and system integration in developing next-generation multimedia systems; and engineering AI systems development for enhanced-embedded battlefield intelligence with the necessary situational awareness.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
San Francisco State University
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant