Completed STANDARD GRANT National Science Foundation (US)

RI: Small: Learning Fine-Grained Instructions from Uncurated Complex Activity Videos

$4.98M USD

Funder	National Science Foundation (US)
Recipient Organization	Northeastern University
Country	United States
Start Date	Oct 01, 2021
End Date	Sep 30, 2025
Duration	1,460 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2115110`

Grant Description

Humans have the remarkable ability of learning to perform complex tasks by watching others performing them and following their instructions. Bringing this capability to machines has far reaching impact on the advancement of the artificial intelligence with important applications, such as designing intelligent assistants and robots that can learn to perform or guide humans through tasks by mining instructional and everyday activity videos.

Despite recent advances, there are major challenges facing video and activity understanding methods to convert raw untrimmed long videos of complex activities into detailed and accurate instructions. These include large appearance and motion variations of instructions across videos, high cost of gathering dense temporal video annotations from long videos, lack of a systematic way of integrating different types of available noisy yet inexpensive labels for effective learning and difficulty of generating long-range future instructions.

This project investigates a comprehensive mathematical framework for learning detailed and accurate instructions from untrimmed long complex activity videos, overcoming the aforementioned challenges. The research project is accompanied with an integrated education and outreach plan, which involves mentoring high school and undergraduate students through the Northeastern's Young Scholar Program and integrating the results of the project into the undergraduate and graduate classes. The project will publicly release an open-source software implementing the developed algorithms.

This project develops new unsupervised and self-supervised task segmentation and subtask (instruction step) localization methods, by investigating a multi-manifold model for tasks and simultaneously learning and finding associations between manifolds across videos while incorporating task constraints and priors. The developed framework allows for handling large appearance and motion variations of subtasks across videos and allows for leveraging other modalities, such as video narrations and audio.

The research team will develop a unified weakly-supervised visual grounding framework based on deep neural networks that learns from different types of available inexpensive noisy weak labels, handles subtasks at the distribution tail and generates future instructions from current observations. Furthermore, the team will investigate a new probabilistic deep learning framework with hierarchically connected modules corresponding to subtask, grammar and task prediction, allowing to integrate all types of weak labels and to generate plausible future subtask sequences.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Northeastern University

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

RI: Small: Learning Fine-Grained Instructions from Uncurated Complex Activity Videos

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants