Active NON-SBIR/STTR RPGS NIH (US)

Methods For Evolutionary Genomics Analysis

$2.32M USD

Funder	NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES
Recipient Organization	Temple University of the Commonwealth
Country	United States
Start Date	Feb 01, 2021
End Date	Jan 31, 2026
Duration	1,825 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	NIH (US)
Grant ID	`11099368`

Grant Description

Summary The parent R35 research program aims to develop innovative methods and tools for the comparative analysis of molecular sequences. The focus is on creating machine-learning methods to perform big data analytics, gaining biological insights, and comparing these with traditional model-based methods in molecular evolution and phylogenetics. A key development in

this program is the Evolutionary Sparse Learning (ESL) framework, designed to enhance molecular evolutionary analyses. Although ESL has been benchmarked against classical methods using high-performance computing (HPC) resources, benchmarking against advanced deep learning (DL) approaches remains infeasible due to the need for substantial computational

power. To address this, we request a Graphics Processing Unit (GPU) cluster to enable DL analyses essential for advancing our research. Two major example projects highlight the need for this system. The first project focuses on discovering fragile clades and causal sequences in phylogenomics. We have developed metrics for gene-species sequence concordance and clade

probability using ESL models, validated across many phylogenomic datasets. Benchmarking these ESL methods against DL approaches, such as MSA Transformer, is crucial. MSA Transformer captures phylogenetic relationships using multiple sequence alignments (MSAs) but requires refinement for orthologous protein sets, demanding a powerful GPU system. The second

project aims to uncover molecular convergences that parallel organismal convergent evolution. Using ESL, we have built genetic models to understand the independent origins of traits such as C4 photosynthesis in grasses and echolocation in mammals. Benchmarking revealed that current methods, including ESL, are limited in detecting convergences involving different residues at

different sites. Therefore, we are developing ESL approaches leveraging DL-generated protein embeddings to infer non-identical sequence convergence. Fine-tuning general DL models for orthologous sequences requires a dedicated GPU cluster, as existing resources are inadequate for the extensive analyses needed. The requested GPU cluster is essential for refining these DL

models and conducting comprehensive analyses, enhancing the impact and scope of our parent grant. Our experienced team and institutional support ensure effective use and maintenance of the equipment, promoting continued advancements in molecular evolutionary analysis.

All Grantees

Temple University of the Commonwealth

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

Methods For Evolutionary Genomics Analysis

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants