Active CONTINUING GRANT National Science Foundation (US)

CAREER: Future phylogenies: novel computational frameworks for biomolecular sequence analysis involving complex evolutionary origins

$4.58M USD

Funder	National Science Foundation (US)
Recipient Organization	Michigan State University
Country	United States
Start Date	Mar 01, 2022
End Date	Feb 28, 2027
Duration	1,825 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2144121`

Grant Description

This award is funded in whole or in part under the American Rescue Plan Act of 2021 (Public Law 117-2).

Phylogenetics is the discipline that seeks to reconstruct and analyze the phylogeny, or evolutionary history, of a set of organisms. Phylogenetic reconstruction is primarily accomplished through computational analysis of DNA and other biomolecular sequence data. Phylogenies and the evolutionary insights that they provide are essential to biology and other disciplines, as well as many applications: important examples include reconstructing and studying the Tree of Life - the evolutionary history of all life on Earth, understanding human origins, infectious disease epidemiology and discovery of new solutions to future pandemics, crop improvement and agriculture, and forensic science.

One of the two key ingredients needed for phylogenetic studies has seen a major leap forward thanks to advances in biomolecular sequencing technology: the scale of available biomolecular data is now among the largest in any domain and, in 2025, biomolecular data velocity and storage is projected to be comparable to or larger than Twitter and YouTube. On the other hand, recent "big data" phylogenetic studies point to a critical gap regarding the second of the two key ingredients in phylogenetics: existing computational algorithms need to move beyond their traditional simplifying assumptions about biomolecular sequence evolution.

Two of the most important assumptions are: (1) "sequence-unaware" methods that ignore the inherently sequential nature of biomolecular sequences, and (2) the pre hoc assumption that evolutionary relationships have a simple branching structure and are "tree-like" - i.e., can be accurately described by a tree or other simple representation. New computational approaches and infrastructure are needed to move beyond these traditional assumptions and unlock the study of "future phylogenies" and next-generation phylogenetics.

This project will therefore create new pathbreaking models and algorithms for complex phylogenetic analyses of biomolecular sequence data. The project also addresses gaps in STEM education through new curriculum development and a collaboration with the Impression 5 Science Center, a children’s science museum in mid-Michigan. Project impacts will be broadened through open-source software distributions and open data resources, new scientific discoveries enabled by the developed software and data infrastructure, scientific outreach activities, and student training and mentoring with a strong emphasis on diversity, equity, and inclusion (DEI).

This project will advance the field of computational phylogenetics along multiple frontiers. The first research objective is to develop new statistical resampling algorithms that move beyond "uninformed" analysis where biomolecular data are assumed to be independent and identically distributed (i.i.d.), and towards "informed" sequence-aware analysis; a central approach will be to make use of the latest advances in machine learning.

The new algorithms will be used to better assess rigor and reproducibility during phylogenetic analyses and other critical-path analytical tasks. The second research objective is to create mathematical theory, statistical models, and computational algorithms to move beyond traditional phylogenetic representations (e.g., phylogenetic trees, etc.), and towards more general graph-theoretic models of complex genome evolution.

The third research objective is to conduct comprehensive validation and performance assessment studies of the first two research objectives’ computational frameworks. The studies will utilize both synthetic and empirical benchmarking datasets that capture a wide range of evolutionary conditions and dataset features. The project also includes two educational objectives: a new course on DEI topics in interdisciplinary computer science, and a new museum exhibit on technology and computer programming that will be exhibited at the Impression 5 Science Center.

Open-source software and open data deliverables will drive future methodological research and enable otherwise inaccessible scientific discoveries, and scientific outreach will help seed and drive uptake of the project’s contributions. The project also includes student training and mentoring activities at the undergraduate and graduate levels. Project deliverables and other results can be found at https://gitlab.msu.edu/liulab.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Michigan State University

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

CAREER: Future phylogenies: novel computational frameworks for biomolecular sequence analysis involving complex evolutionary origins

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants