Loading…

Loading grant details…

Completed STANDARD GRANT National Science Foundation (US)

EAGER: Accelerating Synthetic Biology Discovery through Integrated Curation

$3M USD

Funder National Science Foundation (US)
Recipient Organization University of Colorado At Boulder
Country United States
Start Date Aug 15, 2022
End Date Dec 31, 2025
Duration 1,234 days
Number of Grantees 1
Roles Principal Investigator
Data Source National Science Foundation (US)
Grant ID 2231864
Grant Description

Synthetically designed systems have applications in areas such as sustainable agriculture, manufacturing, sensor development, defense, and medicine. However, the progress and usefulness of synthetic biology has been impeded by the time required for literature studies and the replication of existing but poorly documented work. Attempts have been made to use machine learning and other methods to extract information from publications.

While these post-hoc curation methods have had some success, the annotations produced for these publications are extremely error prone. This project proposes to move away from post-hoc and towards integrated curation to create truly digital publications. This project has the potential to update the scientific publishing process and as a result reduce the effort required for literature searches. This would increase knowledge reuse and reduce duplication of efforts.

This project builds upon the NSF-funded Synthetic Biology Knowledge System (SBKS) project, which endeavored to address these challenges by integrating data from parts repositories with information extracted from literature into a unified knowledge system. However, this form of post-hoc curation requires the extraction of knowledge from manuscript and supplemental text files after publication by curators separate from the original authors.

To handle large amounts of data, machines are used to scour free text and attempt to recognize key words and work out their meaning from context. This approach tests the limits of natural language processing techniques. Additionally, it leaves ambiguous entities that only the original authors might disambiguate.

For example, yeast may refer to many different strains of yeast. Furthermore, the SBKS project also extracted sequences provided as supplemental information in publications. However, these sequences, even when they are provided, are typically poorly annotated, incomplete, and provided in non-machine-readable formats.

Taken together, the SBKS project demonstrated that reconstruction of this important design information through post-hoc curation is extremely noisy and error prone. This project has two main research aims: 1) the creation of an integrated curation framework and 2) the development of a search framework that takes advantage of the curated data provided by the interface.

Upon completion of these aims, this project will have made it easier for authors to submit genetic design information in accordance with the FAIR (findable, accessible, interoperable, and reusable) principles, and thus, enabling the ability of researchers to search for sequences and related genetic designs. Finally, to increase the impact of these aims, we will also develop educational materials and participate in outreach events at key workshops and conferences.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

University of Colorado At Boulder

Advertisement
Apply for grants with GrantFunds
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant