Completed STANDARD GRANT National Science Foundation (US)

Slow Kill for Big Data Learning

$1.7M USD

Funder	National Science Foundation (US)
Recipient Organization	Florida State University
Country	United States
Start Date	Sep 01, 2021
End Date	Aug 31, 2025
Duration	1,460 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2113599`

Grant Description

Big-data applications typically involve large numbers of samples and features and are often contaminated with outliers, posing challenges for variable selection and parameter estimation. Fitting a sparse model with a prescribed cardinality is a common request in practice, but it is associated with solving a highly nonconvex and discrete problem. Using multiple starting points in such nonconvex optimization is common, but is often computationally prohibitive on big data; new cost-effective techniques are needed to alleviate the starting point requirement and ensure the best statistical accuracy.

Moreover, how to adjust an arbitrarily given loss function to guard against gross outliers and achieve a high break-down point poses a major challenge for modern-day data analysis. The project will study innovative and efficient statistical methods and perform rigorous theoretical analysis to answer these questions. In this project, education is tightly coupled with research, consisting of course development, student mentoring, outreach, and recruiting underrepresented students.

The project will propose a novel slow-kill technique for large-scale variable selection, motivated by a scalable optimization algorithm with iteration-varying threshold and simultaneous L2-regularization. The three main elements of progressive quantile control, growing learning rate and adaptive L2-shrinkage in slow kill have solid theoretical support, and its ability to reduce the problem size during the iteration, as opposed to boosting and forward pathwise algorithms, makes it attractive for big data.

The interplay between statistics and optimization in the project will reveal tight error rates and fast convergence under some regularity conditions, without the need to pursue a globally optimal solution. Furthermore, a framework of outlier-resistant estimation will be introduced to robustify a given method beyond the standard likelihood setup. It has a close connection to the method of trimming but includes explicit outlyingness parameters for all samples, which in turn facilitates computation and theory.

With slow kill, the number of data resamplings will be substantially reduced, and the obtained resistant estimators can enjoy minimax rate optimality in both low and high dimensions. Overall, the proposed research will create a new-generation high dimensional tool for robust sparse learning that can accommodate coherent designs and gross outliers in big data applications, to deepen and broaden existing methods and theory in statistics and optimization.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Florida State University

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

Slow Kill for Big Data Learning

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants