Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | University of North Carolina At Chapel Hill |
| Country | United States |
| Start Date | Sep 01, 2021 |
| End Date | Aug 31, 2025 |
| Duration | 1,460 days |
| Number of Grantees | 2 |
| Roles | Principal Investigator; Co-Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2113404 |
A challenge In analyzing big data is that of multiple measurement types that are often generated simultaneously. This research addresses this challenge from the perspective of multi-block (also known as multi-view) data. In particular, the focus is on multiple types of measurements made on the same objects.
One common example is that of multiple 'omics' measurements (e.g. gene and protein expression) in biology and medicine. The data analytic challenge is to understand how the different measurements work together, in terms of modes of joint variation, and how they work independently in terms of individual modes of variation. The proposed new methodology is named DIVAS, an acronym for Data Integration Via Analysis of Subspaces.
An important underlying theme is that the most useful new data analytic methods are invented in the context of interdisciplinary collaboration. The project also provides research training opportunities for graduate students.
DIVAS will be a breakthrough in analysis methods for multi-block data in several ways. First, the algorithm is completely different from existing methods, using a new structure deliberately designed to facilitate partially shared blocks. Second, statistical inference is deliberately incorporated into all aspects of DIVAS, instead of being mostly an after-thought as in most competing methods.
In particular, our applications motivate inference on both scores and loadings, which will be performed using novel principal angle based concepts. Third, the algorithm is based on an innovative combination of perturbation bounds and random direction bounds which draws on ideas from all of probability theory, linear algebra, approximation theory and optimization.
Theoretical validation will be performed using an unusually broad range of asymptotics, that is motivated by the breadth of the driving applications. To ensure focusing on the most important aspect of the methodology, development of DIVAS will be done in direct collaboration with experts (including unfunded collaborators) in other scientific areas. One will be breast cancer research, which typically involves very high dimensional data.
We will mathematically investigate that domain with High Dimension Low Sample Size asymptotics, where the dimensions go to infinity for fixed sample size. The other will be Drosophila behavioral genetics with typically low dimension, which creates additional methodological challenges. Here the completely different classical asymptotics, where the sample size grows for fixed dimension, provide the best insights into performance of the method.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
University of North Carolina At Chapel Hill
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant