Loading…

Loading grant details…

Active STANDARD GRANT National Science Foundation (US)

ACED: Fast and Scalable Whole Genome Analysis on Emerging Hardware Technologies

$5M USD

Funder National Science Foundation (US)
Recipient Organization Cornell University
Country United States
Start Date Apr 15, 2025
End Date Mar 31, 2027
Duration 715 days
Number of Grantees 2
Roles Principal Investigator; Co-Principal Investigator
Data Source National Science Foundation (US)
Grant ID 2435801
Grant Description

Whole Genome Sequencing (WGS) is a powerful tool for uncovering genetic variants linked to diseases, understanding evolutionary processes, and tracing population histories. Given that over a million human genomes have been sequenced to date, the sheer volume of data requires advanced computational solutions for efficient analysis. Our research utilizes the Genotype Representation Graph (GRG) to improve the performance of WGS data analysis significantly.

By optimizing this data structure and using modern parallel computing architectures and techniques, we aim to reduce the time required for complex genomic analyses and enable the fast and efficient processing of large datasets such as housed in the UK Biobank. This project aims to develop tools and infrastructure that will enable researchers to advance our understanding of human genetics and improve the accuracy of population genetic studies, which will ultimately contribute to better health outcomes and greater scientific knowledge.

In addition, this approach to representing large, complex data sets and manipulating them effectively will serve as a proxy for modern computing approaches, to guide the design of advanced parallel computing architectures and techniques.

This research focuses on improving the efficiency of Whole Genome Sequencing (WGS) data analysis through two main objectives. The first objective is to optimize the Genotype Representation Graph (GRG) for modern parallel computing architectures, particularly GPUs, to handle the dynamic nature of genomic computations. Additionally, a matrix abstraction of the GRG will be developed, to enable efficient computation on architectures beyond GPUs by utilizing sparse matrices for near-linear scaling on distributed memory machines.

The second goal is to use the improved GRG to perform accurate Ancestral Recombination Graph (ARG) inference, a critical step in population genetics. By implementing and testing these approaches on the large-scale UK Biobank data, the scalability and accuracy of the novel methodologies will be demonstrated. This interdisciplinary project will combine high-performance computing advances with innovative data structures to answer key questions in population genetics and provide insights for future high-performance systems in the post-Moore’s Law era.

This award is co-funded by the Directorate for Computer and Information Science and Engineering and by the Directorate for Biological Sciences.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Cornell University

Advertisement
Discover thousands of grant opportunities
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant