Loading…

Loading grant details…

Completed STANDARD GRANT National Science Foundation (US)

Collaborative Research: OAC Core: Improving Utilization of High-Performance Computing Systems via Intelligent Co-scheduling

$2.5M USD

Funder National Science Foundation (US)
Recipient Organization University of Arizona
Country United States
Start Date Sep 01, 2021
End Date Aug 31, 2025
Duration 1,460 days
Number of Grantees 1
Roles Principal Investigator
Data Source National Science Foundation (US)
Grant ID 2103511
Grant Description

This project is aimed at increasing efficiency of high-performance computing systems by scheduling multiple jobs on the same set of nodes in a system, generally called co-scheduling. This is a break from current practice in which nodes are dedicated to one job at a time, which results in predictable execution time but inefficient use of system resources.

To make this practical, the project will develop analyses to determine how to carry out co-scheduling such that overall system efficiency is improved while the performance impact on individual applications is minimized. In particular, the goal is to co-schedule jobs that can co-exist without contending for similar resources on the nodes. The work done in this project will help achieve better efficiency on high-performance systems, which will impact application domains such as climate/weather, renewable energy, and national security.

The work will be implemented and validated on systems at Lawrence Livermore and Sandia National Laboratories and then transitioned into software that will be used at these national laboratories. The project will also have an impact on education by integrating the techniques in this research into courses covering parallel and distributed computing at the PIs' institutions.

In addition, the project will take place at two Hispanic-serving institutions, and the PIs have a history of advising under-represented students; the project will broaden participation in computing by recruiting Hispanic undergraduates to work on the project and sending them to national laboratories for internships.

The long-standing abstraction at high-end computing facilities is one of a submitted job being allocated a set of dedicated nodes. However, this makes systems much less efficient, as there are more per-node resources that will often be used inefficiently. In addition, the demand for high-end systems is increasing and dedicating nodes to jobs can increase job turnaround time and decrease overall system throughput.

One way to address this problem is for supercomputer centers to break from the current common practice of assigning each job a private, isolated portion of a supercomputer. The intellectual merit of the project is three-fold. First, novel profile analyses will be developed that will reveal the effects on jobs due to sharing nodes.

Second, novel statistical projection techniques will be developed that predict scaling behavior of jobs that are utilizing shared nodes. Third, new job-level scheduling techniques will be designed that use the interference analysis and projections to choose a set of shared nodes that will lead to good job turnaround time and maximize system throughput.

The broader impact of the project is multifold. This project will help achieve better efficiency on high-performance systems, which will benefit a broad range of applications that includes climate/weather prediction, nuclear energy, and national security. Through a long-standing collaboration with both Lawrence Livermore and Sandia National Laboratories, the PIs will implement and validate the techniques on LLNL and SNL systems as well as transition the techniques into future resource managers at the national laboratories.

In addition, both PIs will broaden participation in computing by recruiting Hispanic undergraduates to work on the project and sending them to national labs for internships.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

University of Arizona

Advertisement
Apply for grants with GrantFunds
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant