Loading…
Loading grant details…
| Funder | NATIONAL HUMAN GENOME RESEARCH INSTITUTE |
|---|---|
| Recipient Organization | University of Washington |
| Country | United States |
| Start Date | Feb 01, 2021 |
| End Date | Jan 31, 2025 |
| Duration | 1,460 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | NIH (US) |
| Grant ID | 10335796 |
Project Summary/Abstract High-throughput sequencing assays allow scientists to measure biochemical properties like transcription factor binding, histone modifications, and gene expression in nearly any cell line or primary tissue (“biosample”). Unfortunately, measuring all possible biochemical properties in every biosample is infeasible, both because of
limited sample availability and because the cost would be prohibitive. We have previously developed a state-of- the-art imputation method, called Avocado, that can fill in the holes in such data sets. Avocado couples tensor factorization with a deep neural network. The method is scalable to large data sets and provides more accurate
imputations than competing methods such as ChromImpute or PREDICTD. We have already applied Avocado systematically to the NIH ENCODE data set and made the imputations publicly available via the ENCODE web por tal. Here, we propose to extend Avocado in four important ways. First, we will extend Avocado to handle single-cell
data sets, thereby effectively turning each single-cell experiment into an in silico co-assay that measures multiple properties of each cell in parallel. Second, we will extend Avocado to work with data such as Hi-C, which measures three-dimensional properties of DNA. The extension involves converting Avocado's 3D tensor (biosample assay
genomic position) to a 4D tensor with two genomic position axes. This extension will apply to a wide variety of data types, including various types of Hi-C data, SPRITE, GAM, ChIA-PET and PLAC-seq. Third, we will enhance Avocado to use variant aware genomic sequence to enable high-resolution imputation of regulatory
profiles. Finally, we will leverage the imputed data to infer cis-regulatory sequence annotations and the molecular impact of regulatory non-coding variants in one of the most comprehensive collections of cellular contexts. All of the software produced by this project will be open source, and all of the imputed data and latent
factorizations will be made publicly available via the web portals associated with the NIH 4D Nucleome and ENCODE Consortia, providing a valuable public resource for users of these data sets.
University of Washington
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant