Completed STANDARD GRANT National Science Foundation (US)

Novel Missing Data Approaches for Corrupted Longitudinal Data

$1.47M USD

Funder	National Science Foundation (US)
Recipient Organization	University of Washington
Country	United States
Start Date	Jul 15, 2021
End Date	Jun 30, 2025
Duration	1,446 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2112907`

Grant Description

Modern longitudinal databases, which can involve combined multiple datasets and varying measurements, may not be complete and clean. Because of the incompleteness, performing downstream statistical analysis is not straightforward. This project aims to develop novel statistical methods to handle incompleteness from missing data perspectives.

The project will also deal with the data linkage issue from cancer research, where researchers have linked their clinical trial data to the Centers for Medicare & Medicaid database. The methods under development will be applied to the National Alzheimer’s Coordinating Center database and Prostate Cancer Prevention Trial data in the Southwest Oncology Group (SWOG) Cancer Research Network.

The methods will also be used to resolve data collection problems caused by the COVID-19 pandemic and other infectious diseases that corrupt data collection. The project offers training for graduate students and research opportunities for undergraduate students.

The project focuses on three research questions. First, the PI aims to develop an inverse probability weighting (IPW) approach to handle linking of one longitudinal database with another database. This IPW method reweights observations based on the linking probability to account for the linking issue.

The project will apply the IPW method to the data linkage issue and develop a new efficiency theory. The second part of the project considers the changing-measurement problem in a longitudinal database, in which a measurement is updated to a newer version during the collection of longitudinal data. The PI intends to formulate this as a missing data problem and introduce a new approach combining latent variable and quantile regression to create a conversion between the new and the old versions of the measurement.

In the third part of the project, the PI plans to develop a "doubly" semi-parametric estimator for handling missingness in both responses and covariates and to study the efficiency theory. The PI will design a set of identifying assumptions on the missingness of covariates to ensure identifiability. A method can then be derived from the identifying assumptions to impute the missing covariates, converting the situation to a standard one in which responses alone are missing.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

University of Washington

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

Novel Missing Data Approaches for Corrupted Longitudinal Data

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants