Loading…

Loading grant details…

Completed CONTINUING GRANT National Science Foundation (US)

Adaptive Bayesian Models for Entity Resolution with Heterogeneous Data

$1.73M USD

Funder National Science Foundation (US)
Recipient Organization University of Florida
Country United States
Start Date Jun 01, 2021
End Date Jan 31, 2023
Duration 609 days
Number of Grantees 1
Roles Principal Investigator
Data Source National Science Foundation (US)
Grant ID 2051911
Grant Description

Methods for integrating information from multiple sources have become critical in order to perform complete and accurate data analyses on streams of data. The process of merging and removing duplicate information from noisy data is known as entity resolution or record linkage. Entity resolution tasks are prevalent in many areas, including public health, human rights, official statistics, social networks, fraud detection, and national security, among others.

Although probabilistic approaches for entity resolution have become more pervasive in recent years, principled approaches that are also computationally tractable and scalable for large data sets are limited. This project aims to develop Bayesian models and efficient computational algorithms suited for entity resolution tasks with heterogeneous types of data.

The methods will be made accessible to practitioners and other researchers through open-source software.

Entity resolution with multiple files can be treated as a clustering task in which similar records that represent the same latent entity are grouped together. In this context, a large number of small clusters or microclusters is expected. The following three general avenues of research will be explored: (a) adaptive prior distributions for random partitions that display microclustering properties and permit straightforward incorporation of prior information at different scales; (b) integrated Bayesian models suited for entity resolution tasks with social network data that are easily adaptable according to the nature of the available information; and (c) computational algorithms for model acceleration of entity resolution applications on big data.

A variety of Markov Chain Monte Carlo algorithms and efficient alternatives for posterior inference in the microclustering setting of entity resolution will be explored to overcome the known practical limitations of Bayesian inference in high-dimensional discrete spaces.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

University of Florida

Advertisement
Discover thousands of grant opportunities
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant