Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | University of Florida |
| Country | United States |
| Start Date | Jun 01, 2021 |
| End Date | Jan 31, 2023 |
| Duration | 609 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2051911 |
Methods for integrating information from multiple sources have become critical in order to perform complete and accurate data analyses on streams of data. The process of merging and removing duplicate information from noisy data is known as entity resolution or record linkage. Entity resolution tasks are prevalent in many areas, including public health, human rights, official statistics, social networks, fraud detection, and national security, among others.
Although probabilistic approaches for entity resolution have become more pervasive in recent years, principled approaches that are also computationally tractable and scalable for large data sets are limited. This project aims to develop Bayesian models and efficient computational algorithms suited for entity resolution tasks with heterogeneous types of data.
The methods will be made accessible to practitioners and other researchers through open-source software.
Entity resolution with multiple files can be treated as a clustering task in which similar records that represent the same latent entity are grouped together. In this context, a large number of small clusters or microclusters is expected. The following three general avenues of research will be explored: (a) adaptive prior distributions for random partitions that display microclustering properties and permit straightforward incorporation of prior information at different scales; (b) integrated Bayesian models suited for entity resolution tasks with social network data that are easily adaptable according to the nature of the available information; and (c) computational algorithms for model acceleration of entity resolution applications on big data.
A variety of Markov Chain Monte Carlo algorithms and efficient alternatives for posterior inference in the microclustering setting of entity resolution will be explored to overcome the known practical limitations of Bayesian inference in high-dimensional discrete spaces.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
University of Florida
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant