Active CONTINUING GRANT National Science Foundation (US)

SaTC: CORE: Small: A Framework for Safety Assurance in Foundation Models

$3.09M USD

Funder	National Science Foundation (US)
Recipient Organization	Virginia Polytechnic Institute and State University
Country	United States
Start Date	Oct 01, 2024
End Date	Sep 30, 2027
Duration	1,094 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2424127`

Grant Description

The rapid advancement and widespread deployment of generative artificial intelligence foundation models, particularly large language models, have also led to an escalation of risks. Despite the tremendous efforts put into their safety alignment, these models have been shown to be fragile, vulnerable to "jailbreaking" attacks against safeguards built into deployed models, and prone to systematic deterioration after custom fine-tuning.

The goal of this project is to advance our understanding of the fundamental causes of safety issues and innovate more effective methodologies for ensuring safety, thereby enabling the trustworthy deployment of foundation models in diverse applications. The scientific advances resulting from this work will support the pressing national need for trustworthy artificial intelligence that benefits society at large.

Furthermore, this project will impact a broader audience through the organization of workshops, innovation competitions, and the development of educational curricula.

This project will pursue the following tasks, weaving safety measures throughout a model's lifecycle: (1) This project will conduct in-depth analyses of the model's behavior and contributing factors to identify the root causes of harmful outputs and develop targeted interventions. (2) This project will develop a comprehensive testing framework that subjects the model to diverse simulated threats, assessing its resilience, identifying vulnerabilities in human-like interactions, and developing effective countermeasures to enhance robustness. (3) This project will explore methods to integrate safety constraints into the model adaptation process and establish systems for continuous monitoring of the model's behavior in real-world applications to ensure the model remains aligned with desired behaviors, secure, and reliable as it is applied in new contexts.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Virginia Polytechnic Institute and State University

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

SaTC: CORE: Small: A Framework for Safety Assurance in Foundation Models

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants