Loading…

Loading grant details…

Completed STANDARD GRANT National Science Foundation (US)

Collaborative Research: EAGER: Solving the bait learning problem for large-scale DNA enrichment

$1.59M USD

Funder National Science Foundation (US)
Recipient Organization University of Florida
Country United States
Start Date Sep 01, 2021
End Date Aug 31, 2024
Duration 1,095 days
Number of Grantees 2
Roles Principal Investigator; Co-Principal Investigator
Data Source National Science Foundation (US)
Grant ID 2118251
Grant Description

The microbiome refers to all micro-organisms within a biological sample and has been linked to numerous biological activities and phenotypes. For example, the “gut microbiome”, which commonly refers to the bacteria found within the digestive system in humans, has been attributed to countless phenotypes and diseases, including obesity, Alzheimer’s disease, autism spectrum disorder, and types of cancer.

Similarly, the soil microbiome has been attributed to drought tolerance, flowering time, and pesticide resistance in plants. Comprehensively studying the micro-organisms in a biological sample is challenging as that it is estimated that between 95% and 99% of them cannot live outside their natural environments, and therefore, cannot be isolated in a laboratory environment.

Fortunately, shotgun metagenomics can address this challenge since it is able to take as input the DNA corresponding to all the micro-organisms within a sample and produce the DNA strings (which are called “sequence reads”) corresponding to them. These sequencer reads are then further analyzed to identify and study the micro-organisms. However, frequently scientists are interested in studying a limited number of micro-organisms rather than all of them.

For example, in the case of studying the micro-organisms from a respiratory swab, it may be the case that only the sequence reads corresponding to COVID-19 are of interest. Shotgun metagenomics will produce sequence reads for all DNA found on the swab. Fortunately, there are laboratory methods capable of restricting the sequencing to only a selected set of DNA sequences, which is referred to as DNA enrichment.

DNA enrichment creates and applies a set of short, synthetic DNA fragments (called “baits”) to a biological sample which then bind to only selected portions of the DNA. The unbound DNA is then rinsed away, leaving only the bound DNA that is then sequenced. Hence, the first step of this process is the informatics problem of identifying the baits that will enrich for a given set of DNA sequences.

Two things that make the informatics problem challenging is that the number of baits should be of minimum size, and the baits do not only bind exactly to a single DNA sequence, but they can bind to many sequences with some allowable mismatches. This project will devise we will combine techniques in informatics, machine learning and data integration techniques to create practical methods for creating baits.

While existing heuristics demonstrated the utility of DNA enrichment, there still does not exist any methods that are able to do this efficiently for large DNA databases. Hence, one of the focuses of this project is to create scalable methods. To accomplish this, we will develop methods for solving two different informatics problems: (1) the Bait Minimization problem that aims to identify the set of baits of minimal size that enrich for the entire set of DNA sequences, and (2) the DNA Sequence Maximization problem which aims to select the largest subset of DNA sequences that can be enriched by using a bait set of restricted size.

Here, we will integrate the information of the biological process into the problem formulations and solve the problems using state-of-the-art informatic approaches. This project will result in several innovations that will have major impact in informatics, data science and other scientific disciplines. The broader impact of this work will encompass the furtherance of our knowledge micro-organisms and the creation of curriculum for Girls Engaged in Engineering Days at the University of Florida.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

University of Florida

Advertisement
Discover thousands of grant opportunities
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant