Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | University of Washington |
| Country | United States |
| Start Date | Jul 15, 2021 |
| End Date | Jun 30, 2025 |
| Duration | 1,446 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2112907 |
Modern longitudinal databases, which can involve combined multiple datasets and varying measurements, may not be complete and clean. Because of the incompleteness, performing downstream statistical analysis is not straightforward. This project aims to develop novel statistical methods to handle incompleteness from missing data perspectives.
The project will also deal with the data linkage issue from cancer research, where researchers have linked their clinical trial data to the Centers for Medicare & Medicaid database. The methods under development will be applied to the National Alzheimer’s Coordinating Center database and Prostate Cancer Prevention Trial data in the Southwest Oncology Group (SWOG) Cancer Research Network.
The methods will also be used to resolve data collection problems caused by the COVID-19 pandemic and other infectious diseases that corrupt data collection. The project offers training for graduate students and research opportunities for undergraduate students.
The project focuses on three research questions. First, the PI aims to develop an inverse probability weighting (IPW) approach to handle linking of one longitudinal database with another database. This IPW method reweights observations based on the linking probability to account for the linking issue.
The project will apply the IPW method to the data linkage issue and develop a new efficiency theory. The second part of the project considers the changing-measurement problem in a longitudinal database, in which a measurement is updated to a newer version during the collection of longitudinal data. The PI intends to formulate this as a missing data problem and introduce a new approach combining latent variable and quantile regression to create a conversion between the new and the old versions of the measurement.
In the third part of the project, the PI plans to develop a "doubly" semi-parametric estimator for handling missingness in both responses and covariates and to study the efficiency theory. The PI will design a set of identifying assumptions on the missingness of covariates to ensure identifiability. A method can then be derived from the identifying assumptions to impute the missing covariates, converting the situation to a standard one in which responses alone are missing.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
University of Washington
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant