Loading…

Loading grant details…

Active NON-SBIR/STTR RPGS NIH (US)

A deep reinforcement learning framework for haplotype assembly

$4.58M USD

Funder NATIONAL HUMAN GENOME RESEARCH INSTITUTE
Recipient Organization Broad Institute, Inc.
Country United States
Start Date Jul 15, 2024
End Date Jun 30, 2026
Duration 715 days
Number of Grantees 1
Roles Principal Investigator
Data Source NIH (US)
Grant ID 10871190
Grant Description

Haplotype assembly is the problem of reconstructing the combination of alleles on the maternally and paternally inherited chromosome copies and is key to our understanding of human population genetics and disease. Numerous statistical and molecular approaches have been developed to date to enable haplotype

reconstruction. In this work, we focus on read-based phasing of individual genomes, which involves the assembly of the two haplotypes from whole-genome-sequencing read alignments and variant genotypes. Fragments that span more than one heterozygous variant provide molecular linkage evidence for alleles

occurring on the same haplotype and can hence be leveraged for haplotype assembly; however, sequencing errors make this problem challenging. Existing techniques often employ an NP-hard combinatorial optimization formulation for this problem and rely on hand-engineered heuristics to find a solution. Here we propose a novel

framework based on deep reinforcement learning, which integrates the representational power of deep learning with reinforcement learning, to automatically learn effective algorithms that can accurately partition read fragments into two haplotype sets given inputs from different sequencing platforms. Importantly, this

approach does not require labeled training data, which allows us to use all the publicly-available datasets collected in large-scale sequencing repositories, such as the 1000 Genomes Project, as training data for our models. Given the complex combinatorial structure of genomic data, an important aspect of this work is the

design and compilation of a representative training dataset to ensure model generalizability. Our initial preliminary results show that our approach can achieve state of the art phasing block lengths and lower error rates on short read inputs.

All Grantees

Broad Institute, Inc.

Advertisement
Apply for grants with GrantFunds
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant