Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | University of Connecticut |
| Country | United States |
| Start Date | Jun 15, 2021 |
| End Date | May 31, 2025 |
| Duration | 1,446 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2105571 |
The ever-increasing amounts of big data offer unprecedented opportunities for advancing knowledge across scientific fields. However, traditional analyses of big data involve high computational costs and often require supercomputers. This project aims to develop computational tools that empower practitioners to analyze big data without dependency on supercomputers.
It produces optimal algorithms that extract the maximum amount of information from massive data with limited computing resources. Rare-events data are common in big data where the numbers of interested events are relatively small although available full data are massive. This project is identifying conditions when the majority data can be discarded without any information loss, and developing methods for valid analysis and appropriate decision-making with rare events data.
Education is another key component of the project, with a significant focus on classroom integration and next-generation workforce training, aiming to attract and equip a broader range of participants, especially underrepresented groups, to the field of computational data science.
Subsampling has demonstrated a pervasive potential to enable better use of a fixed amount of computing resources. However, existing investigations focus on calculations of the collected data, and available results are not suitable for statistical inference on the underlying model. This project develops and expands the subsampling technique in the following directions: 1) It establishes a framework to determine statistically optimal subsampling probabilities by examining statistical distributional properties of subsample estimators; 2) it derives the maximum subsampled conditional likelihood estimator that has the smallest asymptotic variance among a large class of asymptotically unbiased estimators; and 3) it obtains new theoretical insights on rare-events data and challenges a long-standing view of underestimated probabilities for rare events.
The research is a significant addition to the field of big data subsampling and provides tools that are widely applicable to facilitate practical inference and decision-making. It also answers important questions that are essential for extracting valid information from rare-events data.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
University of Connecticut
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant