Completed NON-SBIR/STTR RPGS NIH (US)

Deep tensor genomic imputation

$3.84M USD

Funder	NATIONAL HUMAN GENOME RESEARCH INSTITUTE
Recipient Organization	University of Washington
Country	United States
Start Date	Feb 01, 2021
End Date	Jan 31, 2025
Duration	1,460 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	NIH (US)
Grant ID	`10335796`

Grant Description

Project Summary/Abstract High-throughput sequencing assays allow scientists to measure biochemical properties like transcription factor binding, histone modiﬁcations, and gene expression in nearly any cell line or primary tissue (“biosample”). Unfortunately, measuring all possible biochemical properties in every biosample is infeasible, both because of

limited sample availability and because the cost would be prohibitive. We have previously developed a state-of- the-art imputation method, called Avocado, that can ﬁll in the holes in such data sets. Avocado couples tensor factorization with a deep neural network. The method is scalable to large data sets and provides more accurate

imputations than competing methods such as ChromImpute or PREDICTD. We have already applied Avocado systematically to the NIH ENCODE data set and made the imputations publicly available via the ENCODE web por tal. Here, we propose to extend Avocado in four important ways. First, we will extend Avocado to handle single-cell

data sets, thereby effectively turning each single-cell experiment into an in silico co-assay that measures multiple properties of each cell in parallel. Second, we will extend Avocado to work with data such as Hi-C, which measures three-dimensional properties of DNA. The extension involves converting Avocado's 3D tensor (biosample assay

genomic position) to a 4D tensor with two genomic position axes. This extension will apply to a wide variety of data types, including various types of Hi-C data, SPRITE, GAM, ChIA-PET and PLAC-seq. Third, we will enhance Avocado to use variant aware genomic sequence to enable high-resolution imputation of regulatory

proﬁles. Finally, we will leverage the imputed data to infer cis-regulatory sequence annotations and the molecular impact of regulatory non-coding variants in one of the most comprehensive collections of cellular contexts. All of the software produced by this project will be open source, and all of the imputed data and latent

factorizations will be made publicly available via the web portals associated with the NIH 4D Nucleome and ENCODE Consortia, providing a valuable public resource for users of these data sets.

All Grantees

University of Washington

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

Deep tensor genomic imputation

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants