Active STANDARD GRANT National Science Foundation (US)

Data-adaptive Random Tessellations for Complex Data Analysis

$3.2M USD

Funder	National Science Foundation (US)
Recipient Organization	Johns Hopkins University
Country	United States
Start Date	Oct 01, 2024
End Date	Sep 30, 2027
Duration	1,094 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2402234`

Grant Description

Developing accurate and interpretable models is crucial for the safety and efficacy of machine learning in an ever-increasing range of applications. To achieve state-of-the-art performance, algorithms rely on expensive and opaque optimization procedures that implicitly learn the most important features of the dataset to build the model. The complex nature of these algorithms impedes our ability to interpret the patterns in the data used to generate the output and obtain mathematical performance guarantees.

This project will develop a library of fast and accurate machine-learning algorithms with interpretable mechanisms for learning the most relevant information from a dataset. This project will also create a corresponding mathematical toolkit for analyzing these algorithms to guide optimal implementation and provide statistical guarantees. These interpretable and theoretically justified algorithms will be of particular value for safety-critical applications in engineering and healthcare.

This project will be complemented by the mentorship of undergraduate and graduate research projects utilizing data science for the public good.

Many modern machine-learning algorithms generate complex models using random partitions of the available data set. The most successful approaches, such as random forests and neural networks with piecewise linear activation functions, rely on optimization procedures that generate a data-adaptive partition, making the algorithm very difficult to analyze.

On the other hand, purely random forests and random feature models generate random partitions of the feature space independently of the data. These methods are more amenable to theoretical analysis, but their performance and scalability suffer in the presence of large and high-dimensional datasets. This project will utilize and expand the toolkit of random tessellation processes in stochastic geometry to close the theoretical and computational gap between data-independent and data-adaptive random partitioning methods in machine learning.

This mathematical framework consists of expressive models for random partitions with parameters that will be learned from data and an extensive theory from which to develop a comprehensive understanding of the mathematical properties of the learned models. The goals of the project are to develop state-of-the-art random partitioning algorithms for data analysis, provide matching theoretical performance guarantees, and study fundamental statistical and computational trade-offs of data-adaptivity in the partitioning process.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Johns Hopkins University

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

Data-adaptive Random Tessellations for Complex Data Analysis

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants