Completed STANDARD GRANT National Science Foundation (US)

CRII: CNS: System for Deploying Ultra Low-Latency Machine Learning Applications on Programmable Networks

$1.74M USD

Funder	National Science Foundation (US)
Recipient Organization	Santa Clara University
Country	United States
Start Date	Oct 01, 2023
End Date	Sep 30, 2025
Duration	730 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2245352`

Grant Description

Many modern applications, such as self-driving, security-threat detection, and image recognition rely on machine learning (ML) models, which are statistical models built using large amounts of data to automatically achieve solutions for a set of complex problems. ML models are often very complex, thus they often require very powerful computing hardware and large amounts of time, often taking minutes, to arrive at a solution.

However, modern applications including those mentioned above require decisions to be made in milliseconds to be able to react to the changes in the environment. Therefore, a major challenge in machine learning is to develop methods and computer systems that allow ML models to be able to provide solutions to complex problems very quickly, while minimizing the amount of hardware that the models need.

Solving such a challenge will not only increase the feasibility of using complex ML models for modern applications, but also, for example, provide potential improvements for defense systems that improve our national security. Furthermore, this research directly feeds into the development of new computer systems courses and provides opportunities for a number of undergraduates—many for the first time—to participate in research.

This project focuses on solving the challenges in providing significant reduction in time—defined as latency—to provide solutions for applications that use ML models. An approach that will be explored by this project is for models that traditionally are run on a central processing unit (CPU) or a graphics processing unit (GPU) to run on a domain-specific architecture (DSA) called a network processing unit (NPU).

NPUs are computational hardware that exist in networking devices, such as network interface cards (NICs), and act as the gateway to data that enters and leaves a computer. The main motivation for using NPUs is to mitigate the overhead of passing data to CPUs or GPUs, thereby reducing the latency, CPU cycles and memory spent on processing the data, while providing performance guarantees that come with the reduced need for context switching.

The set of challenges for this approach are: (1) determining the types of ML applications that are feasible to be offloaded onto NPUs with measurable improvements; (2) programming and deploying applications on NPUs in an efficient and scalable manner; and (3) guaranteeing predictable performance with existing traffic. This project will focus on developing methods and a system for deploying ML applications on to NPUs and quantifying their benefits, focusing on existing, simpler ML models, such as decision trees and logistic regression, to show feasibility of its approach and to obtain preliminary metrics on performance improvements.

The entire work will be released as open-source, reusable libraries, and applications for use by other researchers.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Santa Clara University

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

CRII: CNS: System for Deploying Ultra Low-Latency Machine Learning Applications on Programmable Networks

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants