Active STANDARD GRANT National Science Foundation (US)

Collaborative Research: MFB: Integrating Deep Learning and High-throughput Experimentation to Rapidly Navigate Protein Fitness Landscapes for Non-native Enzyme Catalysis

$2.56M USD

Funder	National Science Foundation (US)
Recipient Organization	Duke University
Country	United States
Start Date	Apr 01, 2025
End Date	Apr 30, 2026
Duration	394 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2529581`

Grant Description

Understanding the relationship between protein structure and function remains a major challenge. This knowledge would benefit drug design, recycling, and chemical production. This project is designed to learn how to create proteins that will facilitate reactions seen in nature.

Artificial intelligence will interpret the data generated by experiments. Two classes of enzymes will be modified to facilitate novel reactions. To help diversify the STEM workforce, workshops in machine learning will be offered to students interested in protein design.

Summer research opportunities will be offered to high school and undergraduate students traditionally underrepresented in STEM fields.

In this project, protein engineering is treated as a Bayesian optimization problem, with the objective to explore sequence space for improved specific activity. This approach models both the expected activity and the uncertainty of the prediction made. Training deep learning models is data intensive.

A convolution neural net (CNN) using transformer architecture will use simulated sequence-function data to pretrain. The simulated data will be generated using Rosetta. Pretrained CNN will be refined with experimental data generated using combinatorial codon mutagenesis (CCM).

Enzyme activity in single bacterial cells will be monitored using GFP expression, FACS-based screening, and next-generation DNA sequencing to determine the corresponding amino acid sequences. Biosensor screening can suffer from crosstalk when multiple cells are present. A picoliter-scale microdroplet screening technology developed in the Romero lab will be utilized to avoid this issue.

A simulated annealing algorithm to randomly search over sequence positions and degenerate codons for libraries with high values for the expected batch BO objective will be developed. In addition, a probabilistic program using sampling-based inference to estimate the optimal combination of codons will be designed and implemented.

This project is jointly supported by the Division of Chemical, Bioengineering, Environmental and Transport Systems (CBET), the Division of Chemistry (CHE), and the Division of Information and Intelligent Systems (IIS).

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Duke University

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

Collaborative Research: MFB: Integrating Deep Learning and High-throughput Experimentation to Rapidly Navigate Protein Fitness Landscapes for Non-native Enzyme Catalysis

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants