Completed STANDARD GRANT National Science Foundation (US)

MFB: Deep-Learning Enabled Structure Prediction and Design of Protein-DNA Assemblies

$14.98M USD

Funder	National Science Foundation (US)
Recipient Organization	University of Washington
Country	United States
Start Date	Sep 01, 2022
End Date	Aug 31, 2025
Duration	1,095 days
Number of Grantees	3
Roles	Principal Investigator; Co-Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2226466`

Grant Description

In this Molecular Foundation for Biotechnology (MFB) project, Professors David Baker and Frank DiMaio of the Department of Biochemistry at the University of Washington, and Barry Stoddard of Basic Sciences at the Fred Hutchinson Cancer Center together are developing new ways to model and design protein-DNA complexes using deep-learning (DL) methods. To do this, they will develop three DL-based models: (1) a model for prediction of protein-DNA complex structures from sequence, (2) a model for sequence design of protein-DNA complexes, and (3) a model for quality assessment of protein-DNA complex structure predictions and designs.

The DL models developed in this project will be leveraged in a pipeline for design of sequence-specific DNA binding miniproteins capable of targeting specified sequences of dsDNA. The use of the novel DL-based models in this context will be useful for validating model accuracy and will have broad impact as a powerful tool for designing protein-DNA interfaces for biotechnology applications, such as the design of novel transcription factors, nucleic acid modifying enzymes, and gene correction reagents.

This project lies at the interface of DL research, computational protein design, biochemistry, and structural biology and will provide multi-disciplinary training for undergraduates, graduate students, and postdocs involved in the project. The primary goals of their outreach and education programs are to attract young people to careers in STEM (science, technology, engineering and mathematics) and improve training in biochemistry and computational protein design.

The outreach plan involves a multi-pronged effort focused on engaging undergraduates through individually mentored summer research and a cohort-based undergraduate research program that will run during the academic year. Both efforts will be focused on training undergraduates in contemporary methods in computational protein design and experimental methods for validating protein function, including the novel methods developed in this proposal.

This project seeks to develop a suite of machine learning/deep learning (ML/DL) techniques for modeling protein-DNA complexes. New tools capable of inferring protein-DNA complex structures, predicting the nucleotide specificity of DNA-binding proteins (DBPs), and evaluating accuracy of protein-DNA complex models would be invaluable in solving salient technological problems, such as developing novel transcription factors.

Current approaches lack accuracy or are computationally intensive, primarily due to the difficulties in modeling indirect readout of DNA conformational flexibility, hydrogen bonding and electrostatic interactions, metal ion cofactors, and the highly solvated interfaces of protein-DNA complexes. The specific goals are to develop DL-based methods for (1) Inference of structure models of DNA and protein-DNA complexes from sequences and sequence alignments, based on the recently developed RoseTTAFold model, an ML framework for predicting protein structures; (2) A sequence prediction neural network for designing sequence specific DBPs and predicting their specificity given protein-DNA complex backbone information, and (3) An accuracy prediction model for evaluating structural models of protein-DNA complexes.

The three DL methods developed in this project will be leveraged in the design of DBPs. Designed DBPs will be experimentally validated in a high-throughput pooled format using yeast display, cell sorting, and next-generation sequencing methods to approximate the binding affinity of pooled designs. Designs showing DNA binding activity in yeast display experiments will be further characterized for DNA binding affinity and specificity using in vitro biochemistry techniques and the design models will be confirmed with X-ray co-crystallization.

Application of the ML models in this design context will provide validation of model accuracy and result in a powerful tool for designing protein-DNA interfaces for biotechnology applications, such as the design of novel transcription factors, nucleic acid modifying enzymes, and gene correction reagents.

This project is jointly supported by the Division of Chemistry (CHE), the Division of Information and Intelligent Systems (IIS), and the Division of Physics (PHY).

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

University of Washington

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

MFB: Deep-Learning Enabled Structure Prediction and Design of Protein-DNA Assemblies

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants