Loading…

Loading grant details…

Completed STANDARD GRANT National Science Foundation (US)

New Frontiers of Robust Statistics in the Era of Big Data

$2.36M USD

Funder National Science Foundation (US)
Recipient Organization University of Pittsburgh
Country United States
Start Date Jul 01, 2021
End Date Jun 30, 2025
Duration 1,460 days
Number of Grantees 1
Roles Principal Investigator
Data Source National Science Foundation (US)
Grant ID 2113568
Grant Description

Modern technologies have facilitated the collection of an unprecedented amount of features with complex structures. Although extensive progress has been made towards extracting useful information from massive data, the statistical analysis typically assumed that data are drawn without any contamination. However, in reality the data sets arising in applications such as genomics and medical imaging are usually more inhomogeneous due to either data collection process or the intrinsic nature of the data in the era of big data.

For instance, in gene expression data analysis, outliers frequently arise in microarray experiments due to the array chip artifacts such as uneven spray of reagents within arrays. Compared to the recent advances in the era of big data, research in modeling and theoretical foundations for robust procedures under contamination models has fallen behind.

To bridge this gap, this project seeks to develop new robust estimation and inference procedures which are rate-optimal for various contamination models as building blocks to address the modeling, theory and computational challenges. Upon completion, this work will lead to a comprehensive understanding of contamination models and have an immediate impact on various disciplines such as biology, genomics, astronomy and finance.

The project also provides training opportunities for undergraduate and graduate students, and is used to enrich courses and outreach educational materials in statistics and data science.

This project aims to address some of the most pressing challenges that are faced by robust procedures in high-dimensional and nonparametric contamination models. Specifically, (I) the research begins with statistical inference of low-dimensional parameters in both increasing-dimensional and high-dimensional regressions under contamination models. The PI will study the influence of contamination proportion in obtaining the root-n consistency results.

Robust large-scale simultaneous inference under contamination models are also considered. (II) Next, the PI will revisit some classical nonparametric density estimation problems both under arbitrary and structured contamination distributions. The PI plans to propose rate-optimal procedures and carefully study the effect of contamination on estimation through various model indices, including contamination proportion, the structure of contamination and the choice of loss function. (III) The PI will develop a U-type robust covariance estimator under structured contamination models and provide rigorous theoretical guarantees on its rate optimality.

This general robust estimator can serve as building blocks for establishing many rate-optimal procedures for structured large covariance/precision matrix estimation problems. User-friendly R packages will be developed to implement the proposed methods.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

University of Pittsburgh

Advertisement
Discover thousands of grant opportunities
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant