Loading…

Loading grant details…

Active CONTINUING GRANT National Science Foundation (US)

Bayesian Sparse Dirichlet-Multinomial Models for Discovering Latent Structure in High-Dimensional Compositional Count Data

$1.23M USD

Funder National Science Foundation (US)
Recipient Organization Colorado State University
Country United States
Start Date Sep 01, 2023
End Date Aug 31, 2026
Duration 1,095 days
Number of Grantees 1
Roles Principal Investigator
Data Source National Science Foundation (US)
Grant ID 2245492
Grant Description

The collection and analysis of microbiome data have broad implications for furthering our understanding of human health and performance, agriculture, and ecology, among other areas. Human microbiome research, for example, aims to better understand the role of our microbial communities and how they interact with their host, respond to their environment, and influence disease.

In addition to microbiome data being compositional, as the sum of the microbial taxa reads is fixed, and high-dimensional, they are also zero-inflated, as there are typically more zero reads observed than expected, which has profound implications on modeling and inference. This project aims to advance statistical methods and computational algorithms for the analysis of zero-inflated multivariate compositional count data.

While developed to address the current challenges of microbiome data analysis, the methods will be generally applicable to other settings in which multivariate compositional count data with excess zeros are observed, including biomedical and public health research, econometrics, and ecology. The project will additionally provide educational and professional training and mentoring to graduate students.

Analyzing multivariate count data generated by high-throughput sequencing technology in omics research is challenging due to the high-dimensional and compositional structure of the data, over-dispersion, and potential zero inflation. In practice, researchers often use the Dirichlet-multinomial (DM) distribution and its variants to model these data. However, under the assumptions of a DM model, estimated probabilities for zero counts are strictly positive even if the true probability of occurrence is zero.

This research project aims to develop a novel sparse DM (sDM) model which allows zero count probabilities to take on zero values to simultaneously accommodate potential zero inflation in multivariate compositional count data while estimating compositional probabilities. Additionally, this project will investigate extensions of the sDM modeling framework to high-dimensional variable selection and clustering problems and contribute Markov chain Monte Carlo algorithms for posterior inference that will be made publicly available to practitioners and other researchers.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Colorado State University

Advertisement
Apply for grants with GrantFunds
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant