Active CONTINUING GRANT National Science Foundation (US)

CAREER: Distributional Approximation for Sharp Finite Sample Bounds with Applications to Dependent Data and Complex Estimators

$825.6K USD

Funder	National Science Foundation (US)
Recipient Organization	Harvard University
Country	United States
Start Date	Jul 01, 2025
End Date	Jun 30, 2030
Duration	1,825 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2441652`

Grant Description

AI and machine learning algorithms are transforming numerous scientific fields, with some of the most promising approaches relying on mathematical tools called "finite sample probability bounds." These bounds are crucial, for example, in reinforcement learning, which underpins the success of systems like AlphaGo. Additionally, they play a key role in uncertainty quantification and the theoretical analysis of black-box machine learning algorithms.

However, classical finite sample bounds have a significant limitation: they are often overly conservative. This conservatism leads to underperforming algorithms and unnecessarily loose guarantees. This project is built around a novel yet straightforward idea: finite-sample bounds can be derived from infinite-sample results.

By leveraging recent breakthroughs in optimal transport and probability theory, the project aims to develop a new method for deriving such inequalities. These methods will be applied to problems such as online data-driven decision-making, early stopping rules, and machine learning for multiscale physical models. The PI will interweave their research and teaching throughout the research period and beyond.

In particular, the PI will provide research training opportunities to graduate students and develop undergraduate and graduate courses, with course materials made publicly available and with joint participation from industry.

Classical concentration inequalities, such as the ones derived by Hoeffding or Bernstein, are over-conservative, are regularly inadequate for heavy-tail distributions, and often rely on the assumption of independence. In this project, the PI tackles those limitations from a new angle, starting with an infinite-sample result for a given problem, such as a central limit theorem, and translating it into a finite-sample result for the same problem by using the concept of distributional approximations.

The advantage of this novel proof method is threefold: Firstly, the derived bounds improve as the sample size grows. This leads to inequalities that are considerably tighter than the classical ones. Secondly, limit theorems often hold for many forms of dependent data.

This opens a more promising path to derive finite sample probability bounds than conventional Chernoff-based techniques. Lastly, by extending this approach to non-Gaussian limits, the PI develops finite sample concentration inequalities for heavy-tailed statistics. This project will (1) develop a completely novel method for obtaining concentration inequalities, as well as new results for transport distances and optimal transport, (2) provide machine learning theorists with a new set of powerful probability tools for obtaining high-probability guarantees for high-dimensional estimators and dependent and structured data, and (3) provide tighter tail bounds which will lead to algorithmic improvements and improved uncertainty quantification.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Harvard University

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

CAREER: Distributional Approximation for Sharp Finite Sample Bounds with Applications to Dependent Data and Complex Estimators

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants