Loading…

Loading grant details…

Active CONTINUING GRANT National Science Foundation (US)

CRCNS Research Proposal: Learning by Looking: Modeling visual system representation formation via foveated sensing in a 3-D world

$11.9M USD

Funder National Science Foundation (US)
Recipient Organization Harvard University
Country United States
Start Date Oct 01, 2023
End Date Sep 30, 2026
Duration 1,095 days
Number of Grantees 3
Roles Principal Investigator; Co-Principal Investigator
Data Source National Science Foundation (US)
Grant ID 2309041
Grant Description

Despite remarkable recent strides, current computer vision models still grapple with understanding the shapes of objects and the structure of the physical world as effectively as humans do. This project begins from the premise that the human ability to perceive and understand the visual world hinges on active exploration. By constructing a model that emulates key aspects of the human visual system--and learns to see through active exploration of its environment--this project seeks to broaden the understanding of human vision and assess whether machine learning systems that emulate human vision can harness some of its strengths.

This scientific exploration carries considerable implications for cognitive neuroscience and machine learning, and has potential to inform the development of more robust and reliable artificial vision systems.

To enhance computational understanding of vision and bridge the gap between human and machine perception, this project aims to develop a biologically-informed model that learns from active exploration of the visual world. Current computer vision models, constrained by uniform sampling of static two dimensional (2D) views, fall short in representing a three dimensional (3D) environment structure.

This project tests the hypothesis that active sensing in a structured three-dimensional environment will lead to more robust representations that better capture world geometry, and also correspond more closely to the biological system. The project's main objectives are: (1) to develop a model with separate foveal and peripheral subsystems, coupled with a contrastive learning approach that contrasts foveal and peripheral views, enabling the system to bootstrap its own learning without explicit labels and without using artificial transformations; (2) to place this self-supervised learning system within a vivid, high-definition 3D realm, promoting active learning via diverse sampling policies; and (3) to thoroughly assess whether the model's learned representations demonstrate enhanced geometric representation relative to conventional models, using both computer vision benchmarks and alignment with human neural (fMRI) responses.

Through these aims, this project aims to deepen understanding of how humans learn to see, strengthen the ties between computational neuroscience and machine learning, and pave the way for more human-like, robust, and reliable artificial vision technologies.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Harvard University

Advertisement
Apply for grants with GrantFunds
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant