Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | University of California-Davis |
| Country | United States |
| Start Date | Jul 01, 2022 |
| End Date | Jun 30, 2025 |
| Duration | 1,095 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2210891 |
The proposed research involves two distinctive fields, functional data analysis (FDA) and deep learning. Functional data are random functions, which have become increasingly common due to technological advances to handle massive data. Examples include climate or air pollution data collected over a period of time.
The field has emerged as a mainstream research area, but the literature is mainly focused on estimation problems and has not yet leveraged the advantages of deep learning methods. This project aims to fill these gaps. It includes several new tests for functional data and employs deep learning, instead of the conventional nonparametric smoothing methods, to handle functional data.
The proposed approaches will be applied to various functional data, including evaluating the effect of pollutants on lung cancer mortality and explaining the effects of physical activity on health. A major emphasis is the development of new theory and algorithms. Computer code associated with the research will be publicly disseminated as R- or Python packages.
The research findings will be incorporated in graduate curricula, undergraduate research projects, and short courses at workshops. They will also be presented at professional meetings. Student researchers will receive training in research, computing and communication skills.
Although functional data are intrinsically infinite dimensional, measurements are only available at discrete locations, which may vary from subject to subject. The number of measurement locations per subject can be small (sparse functional data) or grow with the sample size (intensely sampled functional data). The proposed research covers all types of sampling plans and employs, whenever feasible, a single platform that is universally applicable.
Such an approach is important as it is not trivial to judge whether the sampling plan for a particular dataset is intense or sparse. It also has the merit that the theory is unified and automatically reveals the phase transitions of the convergence rates of the corresponding estimators. Project 1 (Hypothesis Testing for Functional Linear Models) aims at developing a general framework for hypothesis testing under the setting of functional linear models.
Existing methods focus on testing a specific null hypothesis using a tailored test and are not well suited for testing the temporal duration of the effect of a functional covariate, such as the impact of PM2.5 on lung cancer. None of them has been shown to be optimal for a composite null hypothesis. We propose a single platform to test the null hypothesis that the regression coefficient of a functional covariate resides in a closed subspace of all possible coefficient functions.
The proposed test, which resembles the classical F-test, is simple and includes tests for global nullity, partial nullity and domain of the coefficient function as special cases. Project 2 (Testing Homogeneity and Independence for Functional Data) addresses the challenges of two fundamental tasks, testing the homogeneity (equal distributions) and independence of functional data.
Such tests are infeasible when the functional process can only be sampled at a few discrete locations, a situation that is ubiquitous in longitudinal studies. For each task, we propose a customized version, marginal homogeneity or marginal independence, that has practical implications and is feasible for theory and implementation. Project 3 (Deep learning for Functional Data) aims at bringing the success of deep learning to bear with functional data.
Surprisingly, the application of deep neural networks to functional data has been scarce and remains an open problem. A recent approach, developed by a team led by the PI, uses neural networks to search for the optimal basis functions to represent a functional input that automatically adapts to the prediction task in hand. We propose to expand the reach and theoretical understanding of this adaptive basis approach.
Another objective is to design new methodology to impute partially observed functional data that uses Transformers, a deep neural network that transforms a given sequence of elements, such as the sequence of words in a sentence, into another sequence. The project will offer a broad range of new opportunities for interdisciplinary training of a future generation of statisticians and will contribute to enhancing a more inclusive atmosphere in statistical sciences.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
University of California-Davis
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant