Active CONTINUING GRANT National Science Foundation (US)

CAREER: Frontiers of Distributed Machine Learning with Communication, Computation and Data Constraints

$6.5M USD

Funder	National Science Foundation (US)
Recipient Organization	Carnegie-Mellon University
Country	United States
Start Date	Feb 01, 2021
End Date	Jan 31, 2026
Duration	1,825 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2045694`

Grant Description

The meteoric success of machine learning (ML) during the past decade can be attributed to the confluence of two key factors: big compute and big data. For example, although neural network models were proposed several decades ago, they came into the mainstream only after the advent of affordable cloud computing, and the availability of massive training datasets.

The state-of-the-art approach towards distributed training is to centrally shuffle data and then partition them across nodes equipped with powerful computation units and high-speed communication links. The indispensability of such communication-, compute- and data-intensive frameworks precludes resource-limited organizations from using current ML algorithms.

This project seeks to democratize ML by enabling it to seamlessly scale to a network of computation-, communication-, and data-constrained nodes. Expected outcomes include distributed training and inference algorithms that are system-aware (robust to communication and computation limitations) and data-aware (can handle statistically skewed and scarce data).

The research outcomes will be complemented by undergraduate, graduate, and high-school outreach classes and a monograph on large-scale learning. The investigator also aims to host an annual collaboration workshop for female researchers to create collaboration and mentorship opportunities for women in STEM.

This project consists of three research thrusts related to communication, computation, and data constraints, respectively. The first thrust develops several facets of communication efficiency in distributed training to achieve an order-of-magnitude reduction in training time. The second thrust will tackle computational heterogeneity, which can cause consistency and scalability issues in model training and inference.

The investigator will design distributed training, inference, and hyper-parameter optimization algorithms robust to such heterogeneity. The third thrust will address fundamental problems that stem from data heterogeneity, such as adaptive node selection, fairness, personalization, and data-scarce learning. Rather than improving the system and the algorithms in isolation, the investigator will take a system-aware and data-aware approach that draws novel insights from scheduling, coding theory, and multi-armed bandits.

She will also collaborate with Google's federated learning team to validate the research outcomes and amplify their impact.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Carnegie-Mellon University

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

CAREER: Frontiers of Distributed Machine Learning with Communication, Computation and Data Constraints

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants