Active NON-SBIR/STTR RPGS NIH (US)

A deep reinforcement learning framework for haplotype assembly

$4.58M USD

Funder	NATIONAL HUMAN GENOME RESEARCH INSTITUTE
Recipient Organization	Broad Institute, Inc.
Country	United States
Start Date	Jul 15, 2024
End Date	Jun 30, 2026
Duration	715 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	NIH (US)
Grant ID	`10871190`

Grant Description

Haplotype assembly is the problem of reconstructing the combination of alleles on the maternally and paternally inherited chromosome copies and is key to our understanding of human population genetics and disease. Numerous statistical and molecular approaches have been developed to date to enable haplotype

reconstruction. In this work, we focus on read-based phasing of individual genomes, which involves the assembly of the two haplotypes from whole-genome-sequencing read alignments and variant genotypes. Fragments that span more than one heterozygous variant provide molecular linkage evidence for alleles

occurring on the same haplotype and can hence be leveraged for haplotype assembly; however, sequencing errors make this problem challenging. Existing techniques often employ an NP-hard combinatorial optimization formulation for this problem and rely on hand-engineered heuristics to find a solution. Here we propose a novel

framework based on deep reinforcement learning, which integrates the representational power of deep learning with reinforcement learning, to automatically learn effective algorithms that can accurately partition read fragments into two haplotype sets given inputs from different sequencing platforms. Importantly, this

approach does not require labeled training data, which allows us to use all the publicly-available datasets collected in large-scale sequencing repositories, such as the 1000 Genomes Project, as training data for our models. Given the complex combinatorial structure of genomic data, an important aspect of this work is the

design and compilation of a representative training dataset to ensure model generalizability. Our initial preliminary results show that our approach can achieve state of the art phasing block lengths and lower error rates on short read inputs.

All Grantees

Broad Institute, Inc.

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

A deep reinforcement learning framework for haplotype assembly

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants