Loading…

Loading grant details…

Active NON-SBIR/STTR RPGS NIH (US)

Gfastar: a C++ library and a tool suite to aid Telomere-to-Telomere genome assembly

$1.7M USD

Funder NATIONAL HUMAN GENOME RESEARCH INSTITUTE
Recipient Organization Rockefeller University
Country United States
Start Date Sep 25, 2024
End Date Aug 31, 2026
Duration 705 days
Number of Grantees 1
Roles Principal Investigator
Data Source NIH (US)
Grant ID 10987122
Grant Description

Project Summary The recent completion of a Telomere-to-Telomere (T2T) human genome has demonstrated that, in principle, existing sequencing technologies allow gapless and nearly error-free, assembly of complex, human-sized genomes. Despite these technological advancements, genome assemblies that are currently being generated

and released in public archives are still incomplete and contain a significant number of errors, which can dramatically impact downstream analyses. Algorithms that can generate T2T genomes are still in their infancy. The few that are available require extensive manual validation and curation and have so far worked on only a

handful of model species. Dedicated algorithms and software tools are essential for achieving T2T assembly completeness and accuracy in all species. In particular, extensive evaluation and sophisticated manipulation of genome assembly graphs are required for T2T genome assembly. To this end, an efficient tool suite is missing.

To bridge this gap, gfastar, a suite of algorithms and tools created for the evaluation and manipulation of assembly graphs will be further advanced and continuously maintained. Gfastar is under active development, and it is currently used by large-scale initiatives aimed at the generation of high-quality reference genomes such

as the Vertebrate Genomes Project. Gfastar is powered by a dedicated C++ library, gfalibs. Gfalibs will be expanded to provide a comprehensive library dedicated to genome sequences and assembly graphs that can support multiple file formats commonly used by the genome assembly community (e.g. FASTA, FASTQ, GFA1/2,

AGP, GAF, BAM, and FASTG), parallelized input/output (I/O) processing and many other general purpose functions and utilities. This library will be extensively used by the whole gfastar software ecosystem (rdeval, gfastats, gfalign, kcount, kreeq, teloscope, and gfase). Currently, several modules have already been

implemented in gfastar. These existing modules will be expanded with additional functionalities and new tools will be developed. All these tools will synergistically contribute to the generation of T2T reference genomes at scale. As a whole, the gfastar tool suite will provide unparallelled algorithms and functionalities for assembly

graph evaluation, manipulation and analysis, significantly supporting the genomic community by helping improve the completeness and accuracy of genomes.

All Grantees

Rockefeller University

Advertisement
Apply for grants with GrantFunds
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant