Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | University of Washington |
| Country | United States |
| Start Date | Apr 01, 2025 |
| End Date | Jun 30, 2026 |
| Duration | 455 days |
| Number of Grantees | 2 |
| Roles | Principal Investigator; Co-Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2451163 |
Artificial Intelligence (AI) methods depend on data. However, much of the most valuable biomedical data is subject to strict access controls due to its sensitivity. This significantly hinders the data’s ability to be findable, accessible, interoperable, and reusable (FAIR), effectively making it “dark data.” Dark data is a major reason why AI remains underdeveloped in healthcare.
To address this problem, this project will develop methods that can safely and securely illuminate dark health data and make it accessible for research without compromising the privacy of the data donors. By creating fully open but privacy-preserving replicas of the original datasets, the project will empower researchers to find, access, inspect, socialize, critique, and reuse this data.
Systematically illuminating dark data will support beneficial AI health applications, particularly for data stored in isolated repositories, such as data about rare diseases.
This project’s goal is to advance the state-of-the-art in privacy-preserving biomedical data sharing through the development of a software library with efficient algorithms and cryptographic protocols. This will enable data custodians from secured, siloed repositories to contribute data to a joint and secure synthetic data generation process. The solution combines Secure Multiparty Computation with Differential Privacy techniques to ensure input privacy (data custodians do not need to disclose their data to anyone, including intermediary and aggregating servers) and output privacy, where the synthetic data sets do not reveal sensitive information about individuals in the training data.
Additionally, the project will improve the understanding of current techniques’ strengths and limitations in training generative artificial intelligence models for high-fidelity and high-utility synthetic genomics (germline) data. It will investigate the limits of direct germline sequence generation using state-of-the-art generative foundation models, generate variant call file data using frontier models for tabular data generation, and validate the concordance of mutation annotation files derived from synthetic and original variant call files by comparing pathogenicity predictions.
The project focuses on rare diseases such as Neurofibromatosis and Acute Myeloid Leukemia, where existing data is minimal and fragmented across different data custodians.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
University of Washington
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant