Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | Santa Clara University |
| Country | United States |
| Start Date | Oct 01, 2023 |
| End Date | Sep 30, 2025 |
| Duration | 730 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2245352 |
Many modern applications, such as self-driving, security-threat detection, and image recognition rely on machine learning (ML) models, which are statistical models built using large amounts of data to automatically achieve solutions for a set of complex problems. ML models are often very complex, thus they often require very powerful computing hardware and large amounts of time, often taking minutes, to arrive at a solution.
However, modern applications including those mentioned above require decisions to be made in milliseconds to be able to react to the changes in the environment. Therefore, a major challenge in machine learning is to develop methods and computer systems that allow ML models to be able to provide solutions to complex problems very quickly, while minimizing the amount of hardware that the models need.
Solving such a challenge will not only increase the feasibility of using complex ML models for modern applications, but also, for example, provide potential improvements for defense systems that improve our national security. Furthermore, this research directly feeds into the development of new computer systems courses and provides opportunities for a number of undergraduates—many for the first time—to participate in research.
This project focuses on solving the challenges in providing significant reduction in time—defined as latency—to provide solutions for applications that use ML models. An approach that will be explored by this project is for models that traditionally are run on a central processing unit (CPU) or a graphics processing unit (GPU) to run on a domain-specific architecture (DSA) called a network processing unit (NPU).
NPUs are computational hardware that exist in networking devices, such as network interface cards (NICs), and act as the gateway to data that enters and leaves a computer. The main motivation for using NPUs is to mitigate the overhead of passing data to CPUs or GPUs, thereby reducing the latency, CPU cycles and memory spent on processing the data, while providing performance guarantees that come with the reduced need for context switching.
The set of challenges for this approach are: (1) determining the types of ML applications that are feasible to be offloaded onto NPUs with measurable improvements; (2) programming and deploying applications on NPUs in an efficient and scalable manner; and (3) guaranteeing predictable performance with existing traffic. This project will focus on developing methods and a system for deploying ML applications on to NPUs and quantifying their benefits, focusing on existing, simpler ML models, such as decision trees and logistic regression, to show feasibility of its approach and to obtain preliminary metrics on performance improvements.
The entire work will be released as open-source, reusable libraries, and applications for use by other researchers.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Santa Clara University
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant