Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | Duke University |
| Country | United States |
| Start Date | Apr 01, 2025 |
| End Date | Apr 30, 2026 |
| Duration | 394 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2529581 |
Understanding the relationship between protein structure and function remains a major challenge. This knowledge would benefit drug design, recycling, and chemical production. This project is designed to learn how to create proteins that will facilitate reactions seen in nature.
Artificial intelligence will interpret the data generated by experiments. Two classes of enzymes will be modified to facilitate novel reactions. To help diversify the STEM workforce, workshops in machine learning will be offered to students interested in protein design.
Summer research opportunities will be offered to high school and undergraduate students traditionally underrepresented in STEM fields.
In this project, protein engineering is treated as a Bayesian optimization problem, with the objective to explore sequence space for improved specific activity. This approach models both the expected activity and the uncertainty of the prediction made. Training deep learning models is data intensive.
A convolution neural net (CNN) using transformer architecture will use simulated sequence-function data to pretrain. The simulated data will be generated using Rosetta. Pretrained CNN will be refined with experimental data generated using combinatorial codon mutagenesis (CCM).
Enzyme activity in single bacterial cells will be monitored using GFP expression, FACS-based screening, and next-generation DNA sequencing to determine the corresponding amino acid sequences. Biosensor screening can suffer from crosstalk when multiple cells are present. A picoliter-scale microdroplet screening technology developed in the Romero lab will be utilized to avoid this issue.
A simulated annealing algorithm to randomly search over sequence positions and degenerate codons for libraries with high values for the expected batch BO objective will be developed. In addition, a probabilistic program using sampling-based inference to estimate the optimal combination of codons will be designed and implemented.
This project is jointly supported by the Division of Chemical, Bioengineering, Environmental and Transport Systems (CBET), the Division of Chemistry (CHE), and the Division of Information and Intelligent Systems (IIS).
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Duke University
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant