Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | University of Pittsburgh |
| Country | United States |
| Start Date | Oct 01, 2023 |
| End Date | Sep 30, 2026 |
| Duration | 1,095 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2329992 |
Object detection methods are trained with data that is geographically biased in terms of both the visual appearance and supervision. This limits applicability to populations outside North America and Europe. The recent trend of training object detection systems with information from text accompanying images, also implies a shift in the supervision these systems receive.
Members of different countries/cultures may produce text with varying structure and content for the same image. This project studies how objects (e.g., cleaning equipment) appear differently in countries with different geographic, cultural, and economic conditions, and how information about these countries may be used to bridge the gaps in appearance.
The project also investigates which objects the speakers of a particular language mention and at what level of specificity (e.g., “dog” vs “beagle”). Finally, the research studies how to best enable detection of rare, culture-specific concepts. This project involves extensive work with members of different countries/cultures who will consult the researchers on reasons for varying object appearance and naming and discuss the benefits and dangers of computer vision in their everyday lives.
It will also support training for graduate and undergraduate students, from a diverse population in the Pittsburgh community.
This project examines the reasons for objects to appear differently in different domains (countries) and develops techniques to enable detection in target domains with limited data (e.g., Africa) using the knowledge from language models, such as country-specific distinctive features of objects and information about the visual environment in a country and combining these with domain adaptation and prompt learning. Very limited prior work studied the problem of visual appearance and background shifts across geographic regions.
This shows that standard domain adaptation methods do not help to bridge domain gaps; this project instead seeks help from diverse types of knowledge in language models. The project also studies how cultures provide different descriptions (captions) of images, in terms of objects mentioned and the levels of an entity hierarchy used to name objects. Prior work in language-supervised object detection fails to account for cross-language differences in captions, and multilingual work in retrieval does not focus on objects.
The researchers will develop techniques to leverage machine translation in a manner that is aware of different specificity of object naming, and to share information across languages using soft positives in a contrastive learning framework. Furthermore, the project will study how to enable training for objects specific to given cultures, that may not be available in the vision-language pretraining set.
While open-vocabulary work examines detection of rare objects, it does not exclude those from the pre-training set. This project aims to reduce a model’s reliance on object names, boost reliance on descriptive context (e.g., attributes), and leverage related categories. Beyond the project’s findings, the team will develop a tool and collect new annotations to share with the community: bounding boxes for datasets that only offer captions in different languages, and captions for datasets collected in different countries, but only providing object annotations.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
University of Pittsburgh
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant