Active STANDARD GRANT National Science Foundation (US)

Collaborative Research: HCC: Medium: Aligning Robot Representations with Humans

$3.14M USD

Funder	National Science Foundation (US)
Recipient Organization	University of Utah
Country	United States
Start Date	Aug 15, 2023
End Date	Jul 31, 2026
Duration	1,081 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2310759`

Grant Description

This project seeks to make robots more robust and aligned with human preferences and values. Traditionally, robot behaviors and objectives were trained to include a set of hand-crafted features (i.e., variables represented in the data) that reflect task-relevant aspects of the environment. Using well-chosen features is very data-efficient, but it is unrealistic for human engineers to identify and write code ahead of time for all the features that could matter.

Training modern high-capacity models from a lot of data is a great alternative, as long as we do not probe the learned models on novel (out-of-distribution) inputs. The reason these models fail to generalize to out-of-distribution inputs is that they will generally fail to learn the correct representation, comprising the features that matter, and instead pick up on spurious patterns in the data.

The central goal of this project is to enable robots to arrive at the underlying correct representation for objectives (and, hence, behaviors). And since learning the objective function---what the human user wants---is fundamentally about humans, this work proposes that only the human can determine what actually matters vs. what is spurious. The research will introduce the problem of aligning robot representations to humans.

The key observation behind the project is that traditional input used in learning, such as demonstrations or comparisons, which is designed to teach the robot the full task, is not ideal for aligning the robot’s representation. With representation alignment defined as a problem, there is the opportunity to design new types of human feedback that help the robot explicitly isolate the right representation.

The project will develop new types of human feedback and algorithms for efficiently learning from them to arrive at an aligned representation.

Preliminary work leveraged this observation to introduce feature traces---a novel type of human input through which users can teach the robot about specific features they care about. The project will pursue four objectives that together tackle the aspects of aligning robot representations with humans: (1) Teaching one feature at a time, beyond feature traces: It will investigate new input types for aligning robot representations with users, contribute active learning algorithms that help the human teacher provide the most informative input, and build transparency tools that enable robots to teach back to the user their current understanding of the representation. (2) Extracting features all at once from new, representation-specific human input: It will investigate new human input types that teach the full representation all at once by combining self-supervised representation learning methods with human-centric representation learning. (3) Using a correct representation in the right way: Given a new task, the robot needs to learn which features matter and in which contexts. (4) Extending earlier work to policy learning: It will extend new tools to the policy learning setting and use the lens of human-aligned representations to enable better policy generalization to new users and to improve goal mis-generalization in reinforcement learning.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

University of Utah

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

Collaborative Research: HCC: Medium: Aligning Robot Representations with Humans

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants