Active CONTINUING GRANT National Science Foundation (US)

SHF: Medium: Reasoning about Multiplicity in the Machine Learning Pipeline

$7.91M USD

Funder	National Science Foundation (US)
Recipient Organization	University of California-San Diego
Country	United States
Start Date	Oct 01, 2024
End Date	Sep 30, 2027
Duration	1,094 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2446711`

Grant Description

Machine learning is deployed across various domains (e.g., finance, education, hiring) with the assumption that model outcomes are accurate and authoritative. But in reality, the specific model that is deployed is just one option of many: previous work has shown that multiplicity – the existence of multiple equally good models – arises at many stages of the machine learning pipeline.

Formally reasoning about multiplicity is challenging due to the large (potentially infinite) set of models one has to take into account. As such, existing techniques are currently only able to reason about certain forms of model-based multiplicity, and generally only with empirical guarantees. This project’s novelties are a set of approaches that increase the auditability of machine learning pipelines.

These techniques consist of frameworks and formal techniques to understand how multiplicity in the dataset creation and modeling processes impacts the final learned model that is deployed. The project’s impacts are especially prominent in domains where the decisions of machine learned models directly affect humans --- understanding multiplicity is vital for developing machine learning models that are fair and robust.

The investigators are involved with organizing outreach programs to expose high schoolers and undergraduates from underrepresented backgrounds to computer science and topics in machine learning.

This project investigates multiplicity for diverse model architectures across the whole machine learning pipeline including training data, model predictions, and model explanations. The research integrates formal methods and robust machine learning techniques to provide techniques to help answer the question of whether machine learning outcomes are reliable, or whether they are just an artifact of multiplicity.

For instance, the investigators study algorithms to certify (deterministically or probabilistically, depending on the model architecture) whether a model’s prediction is robust under various sources of multiplicity.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

University of California-San Diego

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

SHF: Medium: Reasoning about Multiplicity in the Machine Learning Pipeline

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants