Loading…

Loading grant details…

Active NON-SBIR/STTR RPGS NIH (US)

Enabling data quality assessment of organelle genomes archived on GenBank through novel open-source software tools

$1.2M USD

Funder NATIONAL LIBRARY OF MEDICINE
Recipient Organization Fort Hays State University
Country United States
Start Date Sep 10, 2024
End Date Aug 31, 2026
Duration 720 days
Number of Grantees 1
Roles Principal Investigator
Data Source NIH (US)
Grant ID 10858197
Grant Description

Project Summary/Abstract The project aims to enable scientists from various biomedical disciplines to make informed, evidence-based decisions on the reuse of archived organelle genomes. It aims to give scientists the computational means to evaluate data quality among the thousands of mitochondrial and plastid genomes stored on the sequence

database GenBank. Organelle genomes archived on GenBank are retrieved and employed in many biomedical investigations, including on human genetics, microbiology, environmental health, toxicology, and forensics. However, many studies ignore that a considerable proportion of mitochondrial and plastid genomes on GenBank exhibit signs of incorrect genome assembly, incomplete sequence annotation, or

both. Indications of low data quality are even found among organellar genome records with reference genome status. Hence, new computational methods are needed to assess the data quality of GenBank- archived organelle genomes so that only accurate and reliable genome records are selected and integrated

into new analyses. The proposed project develops such methods. It generates novel software tools that enable scientists to assess GenBank-archived organelle genomes from various eukaryotic lineages in an automated, standardized fashion. The new tools enable users to evaluate, quantify, and visualize

those aspects of organellar genome records on GenBank that are applicable across all such records and indicative of their genome accuracy and completeness. A total of four software tools are developed. Tool #1 automatically links organellar genome records to their short-read data in the database SRA. Tool #2

assesses the quality of organelle genomes by measuring sequencing coverage and sequencing evenness. Tool #3 assesses genome quality by comparing the genome sequence of a given record to its de novo re- assembly under modern assemblers. Tool #4 assesses genome quality by contrasting the gene annotations

of a given record to those of closely related individuals or species. Quality assessments at different scales are enabled by implementing features that support the integration of each tool into automated analysis pipelines. The tools are tested on large and diverse sets of GenBank-archived organelle genomes, including

a data set of thousands of human mitochondrial genomes. Each tool is written in a common scripting language and distributed as an open-source application to encourage wide reuse by other scientists. As a result, the project expands the existing computational toolkit for organelle genomics and allows scientists

across different biomedical disciplines to use quality metrics to decide which mitochondrial and plastid genomes on GenBank to reuse. Taking place at a predominantly undergraduate institution, the project actively includes student researchers with the goal of training them in genome assembly, annotation, and

bioinformatics tool development.

All Grantees

Fort Hays State University

Advertisement
Apply for grants with GrantFunds
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant