Loading…

Loading grant details…

Active STANDARD GRANT National Science Foundation (US)

CSSI Elements: Multi-GPU and Network Modeling and Simulation in SST

$6M USD

Funder National Science Foundation (US)
Recipient Organization New Mexico State University
Country United States
Start Date Jan 01, 2025
End Date Dec 31, 2027
Duration 1,094 days
Number of Grantees 2
Roles Principal Investigator; Co-Principal Investigator
Data Source National Science Foundation (US)
Grant ID 2411456
Grant Description

Multi-GPU (graphics processing unit) systems are the most common accelerator platform capable of handling large-scale problems in High Performance Computing (HPC), Machine Learning (ML), and Artificial Intelligence (AI). This project will develop a scalable and accurate multi-GPU simulation framework as part of the Structural Simulation Toolkit (SST) that will assist both computational scientists and the designers of the next generation of advanced computing systems.

Unlike existing solutions, this framework will provide computational scientists and analysts with the ability to estimate the gains and overheads associated with accelerating their applications, codebases, and workflows on GPU systems. It will also enable system designers to efficiently explore system alternatives. The developed simulator will furthermore enable the comparison of the performance of GPUs from different vendors.

The multi-GPU-SST will be publicized through a comprehensive strategy that includes in-person and online tutorials, training, and educational activities. The proposed framework will be shared through tutorials and workshops at conferences, ensuring that a wide range of computational scientists, analysts, and academics can benefit from and contribute to its development.

Through three tasks, the project will develop a multi-fidelity, multi-GPU simulation framework in SST. First, it will create a single GPU model in SST. Second, the project will address a critical need in the field by developing a multi-GPU simulation model that supports state-of-the-art GPU networking, interconnects, and communication.

The proposed research will not only study but also actively work to resolve the scalability issues in simulating large-scale GPU systems, developing techniques to improve the simulation scalability and ensuring the framework's applicability to real-world scenarios. Third, it will create an AI-based model to enhance the performance of SST, especially the proposed GPU interconnection network model.

This AI-based model will leverage machine learning algorithms to optimize the performance of the GPU interconnection network, thereby improving the overall efficiency and speed of the simulation framework. The project will conduct a comprehensive performance analysis of emerging GPU workloads, including those from the Department of Energy's Exascale Computing Project, HPC applications, standard benchmarks, ML applications, and Large Language Models (LLMs).

This analysis will provide valuable insights into the performance characteristics and potential areas for improvement of these workloads, thereby contributing to the advancement of GPU-accelerated computing in various fields.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

New Mexico State University

Advertisement
Apply for grants with GrantFunds
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant