Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | Harvard University |
| Country | United States |
| Start Date | Sep 01, 2021 |
| End Date | Aug 31, 2024 |
| Duration | 1,095 days |
| Number of Grantees | 2 |
| Roles | Principal Investigator; Co-Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2118096 |
Intelligent machines could help achieve major human goals. But even current state-of-the-art machines can catastrophically misunderstand what they were asked to do, resulting in machines that 'do what you asked, but not what you want'. In contrast to these failures of human-machine interactions, from an early age humans can quickly and efficiently communicate their goals, and find ways to cooperate and help.
But when people's values do not align, they find ‘loopholes’ to avoid cooperating or complying. Loopholes offer a unique window into the successful but opaque commonsense process of goal understanding. While loopholes are a pervasive everyday concern with real world implications, there is little computational or cognitive research examining this phenomenon.
This project means to study the mental processes that allow humans to intuitively and purposefully contort communication in loophole-behavior. This research will help tackle central open challenges in the design of safe intelligent machines and human-technology interactions, and will improve our understanding of the emergence of social interactions.
Previous research has focused on how children learn to communicate socially and negotiate values, but not on how children and adults handle and exploit value misalignment. This raises a crucial question for cognitive science and human-machine interactions: how do people learn to go from ambiguous communication to the alignment of intended goals, plausible alternatives, and one’s own values?
Studies of development are particularly important in answering this question, as the developmental trajectory sheds light on which processes are foundational to this ability and which are brought in piecemeal with greater knowledge and experience. The project combines methods from AI, computational cognitive science, and social cognitive development, and will (1) characterize the emergence and scope of loopholes in the wild with large open databases using citizen-science and public data, (2) build a formal framework informed by the data for modeling the interpretation and (mis)alignment of social goals from sparse statements, (3) validate the framework using controlled experiments with diverse populations to study the evaluation of loophole-seeking from childhood to adulthood, and (4) extend this framework with novel studies on the inferred goals of machines in human-machine interactions.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Harvard University
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant