Completed STANDARD GRANT National Science Foundation (US)

CIF: Small: Statistically Optimal Subsampling for Big Data and Rare Events Data

$4.1M USD

Funder	National Science Foundation (US)
Recipient Organization	University of Connecticut
Country	United States
Start Date	Jun 15, 2021
End Date	May 31, 2025
Duration	1,446 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2105571`

Grant Description

The ever-increasing amounts of big data offer unprecedented opportunities for advancing knowledge across scientific fields. However, traditional analyses of big data involve high computational costs and often require supercomputers. This project aims to develop computational tools that empower practitioners to analyze big data without dependency on supercomputers.

It produces optimal algorithms that extract the maximum amount of information from massive data with limited computing resources. Rare-events data are common in big data where the numbers of interested events are relatively small although available full data are massive. This project is identifying conditions when the majority data can be discarded without any information loss, and developing methods for valid analysis and appropriate decision-making with rare events data.

Education is another key component of the project, with a significant focus on classroom integration and next-generation workforce training, aiming to attract and equip a broader range of participants, especially underrepresented groups, to the field of computational data science.

Subsampling has demonstrated a pervasive potential to enable better use of a fixed amount of computing resources. However, existing investigations focus on calculations of the collected data, and available results are not suitable for statistical inference on the underlying model. This project develops and expands the subsampling technique in the following directions: 1) It establishes a framework to determine statistically optimal subsampling probabilities by examining statistical distributional properties of subsample estimators; 2) it derives the maximum subsampled conditional likelihood estimator that has the smallest asymptotic variance among a large class of asymptotically unbiased estimators; and 3) it obtains new theoretical insights on rare-events data and challenges a long-standing view of underestimated probabilities for rare events.

The research is a significant addition to the field of big data subsampling and provides tools that are widely applicable to facilitate practical inference and decision-making. It also answers important questions that are essential for extracting valid information from rare-events data.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

University of Connecticut

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

CIF: Small: Statistically Optimal Subsampling for Big Data and Rare Events Data

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants