Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | George Mason University |
| Country | United States |
| Start Date | Sep 15, 2021 |
| End Date | Aug 31, 2025 |
| Duration | 1,446 days |
| Number of Grantees | 4 |
| Roles | Principal Investigator; Former Principal Investigator; Co-Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2120318 |
This research project will develop computational tools for minimizing the impact of mismatched records on subsequent data analysis. To adequately address a research question of interest, multiple data sources often need to be combined. Record linkage is the process of identifying matched records in multiple data sources pertaining to the same entity.
Advances in record linkage and computation yield substantial opportunities for creating rich data products. At the same time, high data volumes, data quality issues, and the need for data anonymization and privacy increase the potential for mismatch error that can considerably disrupt subsequent analysis and in turn lead to incorrect conclusions. The tools to be developed in this project will help leverage the potential inherent in linked data by improving the integrity of a significant range of downstream statistical analyses.
All technical developments resulting from this project will be released as open-source software. Research results will be applied to large-scale survey data analysis and data linkages of interest to the Federal statistical agencies. The investigators will integrate the results of this project into their educational activities and will offer hands-on tutorials to train students, professionals, and scientists in the analysis of linked data. The project also will provide research opportunities and support for graduate students.
This research project will build on techniques in high-dimensional statistics and optimization to develop a suite of methods adjusting for and correcting mismatch error along with uncertainty quantification. The investigators will tackle a variety of problems whose solutions will require an appropriate balance of statistical, algorithmic, and practical aspects pertaining to specific real data applications.
The statistical properties of the methods will be rigorously studied theoretically, in simulation studies, and in various contemporary linked data problems. Post-linkage analytic scenarios to be investigated include modern semiparametric regression and common unsupervised multivariate analysis methods that have scarcely been studied in the context of linked data analysis.
Advances in optimal transport theory will be leveraged to correct mismatch error and hence improve data quality. This award is supported by the MMS Program and a consortium of Federal statistical agencies.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
George Mason University
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant