Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | New York University |
| Country | United States |
| Start Date | Feb 15, 2025 |
| End Date | Jan 31, 2030 |
| Duration | 1,811 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2443404 |
The real world is abundant with visual information, demanding both humans and intelligent systems to navigate and adapt to its complexity efficiently. Human visual processing can be broadly categorized into two types: intuitive processing, which enables rapid, experience-driven decision-making, and deliberate processing, which involves focused and systematic reasoning for complex tasks.
Robust visual intelligence, the ability to integrate these types of processing, is fundamental to real-world problem solving and developing common-sense understanding. However, current artificial intelligence (AI) systems lack this level of robustness, as they often rely heavily on language models to compensate for deficiencies in visual architecture.
This reliance becomes a critical bottleneck as tasks and scenarios grow more complex, limiting the adaptability and reliability of AI systems in practical applications. The project aims to build a hybrid, vision-centric framework that integrates intuitive and deliberate visual processing to create more robust visual intelligence. This improved capability has the potential to empower research in many other fields.
For example, robotics researchers could use the system to help robots see, move and act more intelligently in challenging environments. Similarly, scientists and educators could use it to better understand complex diagrams, making it easier to uncover new insights and improve learning.
The project aims to develop a hybrid approach that combines parametric (intuitive) and non-parametric (deliberate) visual processing methods to build robust visual intelligence. The first direction focuses on developing new ways to learn visual representations using techniques like visual self-supervised learning, language guidance, and generative modeling.
And the goal for this direction is to advance vision-centric parametric knowledge to form the foundational layer of intuitive understanding. Building on this, the second direction incorporates human-like non-parametric mechanisms, such as visual search and working memory, to enhance deliberate reasoning capabilities. The third direction integrates these two approaches into a unified, hybrid architecture, where a high-level controller is capable of activating either method as needed for task-specific demands.
Finally, the system will be tested in real-world, dynamic environments that extend beyond static image datasets, including tasks requiring long-form video analysis and visual-spatial reasoning. By applying the framework to these new challenges, the project aims to produce more adaptable, reliable, and broadly useful methods that expand the possibilities of vision-based applications.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
New York University
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant