Loading…

Loading grant details…

Active STANDARD GRANT National Science Foundation (US)

CSR: Small: Cross-Layer Scheduling to Democratize AI for Foundation Model Inference Serving - An Algorithm-System Co-design Approach

$6M USD

Funder National Science Foundation (US)
Recipient Organization The University of Central Florida Board of Trustees
Country United States
Start Date Apr 01, 2025
End Date Mar 31, 2028
Duration 1,095 days
Number of Grantees 1
Roles Principal Investigator
Data Source National Science Foundation (US)
Grant ID 2426368
Grant Description

Artificial Intelligence (AI) applications, such as ChatGPT, which can create digital content such as images, text, and video, have been considered promising for tasks like creative writing and software development. However, these AI applications’ complexity and demand for costly computational resources limit their access for many people and organizations.

This project’s novelties are the creation of software-hardware solutions that enable these AI applications to run efficiently on smaller, less powerful computer systems. The project's broader significance and importance are making AI more available worldwide. This work also supports education by involving students in groundbreaking AI research and creating public AI-related tools to benefit society.

This project addresses the critical challenge of running large foundation models, like those powering advanced AI applications, on smaller, resource-limited computer systems. By combining algorithmic and system-level innovations, the project seeks to improve efficiency, scalability, and resource utilization in foundation model inference, making AI more available.

This project consists of three research thrusts. First, the project tackles data transfer bottlenecks between graphics processing units (GPUs) and central processing units (CPUs) by designing a sparsity-aware scheduling system. This approach prioritizes important data (tokens) and dynamically manages memory to reduce overhead.

Second, the project optimizes how AI tasks are scheduled by introducing speculative execution for real-world workloads and adaptive memory management. Finally, the proposed solutions minimize GPU scheduling delays and cross-layer communication overhead using software-based solutions to improve kernel compilation. These innovations are expected to significantly reduce inference time and cut costs and energy consumption, while broadening access to AI tools for more people and organizations.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

The University of Central Florida Board of Trustees

Advertisement
Apply for grants with GrantFunds
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant