Loading…

Loading grant details…

Completed STANDARD GRANT National Science Foundation (US)

CIF: Small: New Directions in Clustering: Interactive Algorithms and Statistical Models

$4.89M USD

Funder National Science Foundation (US)
Recipient Organization University of California-San Diego
Country United States
Start Date Jun 01, 2021
End Date Oct 31, 2024
Duration 1,248 days
Number of Grantees 1
Roles Principal Investigator
Data Source National Science Foundation (US)
Grant ID 2133484
Grant Description

Clustering, or partitioning data-points based on pairwise similarities, is one of the most important problems of unsupervised learning and data mining. Algorithms for clustering often suffer from low accuracy and high time complexity due to processing massive amounts of data that are noisy, and for lack of a benchmark to validate the clustering output.

This project studies new directions for clustering algorithms and their fundamental limits by incorporating several new technical tools from information theory. Scalable algorithms for clustering with theoretical guarantees and practical validation should have high impact in all areas of data science.

This project aims to make use of interactive algorithms that adaptively gather labeled data to improve accuracy of clustering and related machine learning problems. In recent years, wide-spread use of crowdsourcing has facilitated the development of such algorithms. The algorithms that will be developed, as well as the lower bounding methods, extend to a large class of machine learning problems, and help understand trade-off between information theoretic optimality and computational efficiency.

A generalized setup taking into account erroneous interactions, use of similarity measures, overlapping clusters and parallel interactions in design of algorithms is being pursued. Connections of this setup with well-known random graph community models such as the stochastic block model will also be explored in this project. New statistical models, such as the geometric block model, will be studied as an alternative to the popular stochastic block model. Algorithms for community recovery for the geometric block model and

information theoretic lower bounds will be developed. This project will also develop statistical models that faithfully capture the properties of real networks to provide a benchmark for testing clustering algorithms. These include statistical models for community detection and interactive clustering.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

University of California-San Diego

Advertisement
Apply for grants with GrantFunds
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant