Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | University of California-San Diego |
| Country | United States |
| Start Date | Sep 01, 2023 |
| End Date | Aug 31, 2026 |
| Duration | 1,095 days |
| Number of Grantees | 2 |
| Roles | Principal Investigator; Former Co-Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2311833 |
Earthquake hazards pose potentially life-threatening risks to communities and cause significant economic damage. Numerical simulations of earthquakes on large-scale supercomputers are emerging as key to guiding the infrastructure and policy decisions as a result of earthquake modeling. These seismic and other codes including simulations involving Fast Fourier Transform (FFT) distribute the processing across a large number of compute nodes in a supercomputer.
Optimizing the communication between nodes is key to achieving good performance but it is a daunting task given the scale of execution. The MVAPICH communication library that implements the Message Passing Interface (MPI) and the TAU Performance System, a profiling tool to observe the communication, will be tightly coupled to assess the performance impact of tuning these codes during execution.
These libraries will share key performance parameters and optimize the communication in these applications to improve the time to solution. Performance-engineered versions of these codes will help drive the next generation of earthquake forecasting and help improve our understanding of seismic events to reduce risks to population centers and the environment.
The research will enable undergraduate and graduate curriculum advancements via research in pedagogy for High Performance Computing (HPC), Deep/Machine Learning, and Data Analytics courses. The results will also be disseminated to the collaborating organizations of the investigators to impact their HPC software applications.
Emerging HPC systems---driven by many-core processors and accelerator architectures--- require innovations in existing infrastructure to deliver the best performance for science domains. The MPI 4.0 standard has also brought forward new opportunities for co-designing applications. These include partitioned point-to-point and collective operations, and neighborhood collectives.
With these advances, there is a critical need to update the commonly used tools and libraries that form the basis for the NSF’s HPC cyberinfrastructure. The research undertakes this challenge and pursues new performance engineering avenues---by exploiting a co-design approach using the MPI_T API---in the MVAPICH2 and TAU libraries with scientific applications.
The project focuses on two popular HPC applications spanning multiple domains and representing various communication patterns - Anelastic Wave Propagation (AWP-ODC) and Highly efficient FFTs for Exascale (heFFTe). AWP-ODC is a highly scalable parallel finite-difference application with point-to-point operations that enables 3D earthquake calculations.
HeFFTe, dominated by collective operations, is a massively parallel application that provides a scalable and efficient implementation of the widely used Fast Fourier Transform (FFT) operations. The research aims to investigate and develop the following innovations by co-designing MVAPICH2 and TAU libraries to scale driving science domains---including AWP-ODC and heFFTe: 1) Load-aware designs for MPI asynchronous communication, 2) Cross runtime coordination for MPI+X applications, 3) Partitioned point-to-point primitives, 4) Application-aware neighborhood collective communication, 5) Support for adaptive persistent collective communication, and 6) Coordinating communication kernels on GPUs.
Integrated development and evaluation are carried out to ensure proper integration of proposed designs with the driving applications, and closely work with internal and external collaborators to facilitate wide deployment and adoption of the released software. The transformative impact of the proposed effort is to extract the performance and scalability of HPC applications in next-generation HPC architectures through intelligent performance engineering.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
University of California-San Diego
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant