Loading…

Loading grant details…

Active CONTINUING GRANT National Science Foundation (US)

RAISE: Chip-to-chip photonic connectivity in multi-accelerator servers for ML

$9.7M USD

Funder National Science Foundation (US)
Recipient Organization Cornell University
Country United States
Start Date Oct 01, 2024
End Date Sep 30, 2027
Duration 1,094 days
Number of Grantees 1
Roles Principal Investigator
Data Source National Science Foundation (US)
Grant ID 2444537
Grant Description

This RAISE project will develop new methods to connect multiple chips within computers using light instead of electrical wires. Using light to transfer data between chips can make data transfer faster and more energy efficient, which is crucial for working with large and complex data needed for societal applications like artificial intelligence, climate modeling, and biomedical research.

The project will closely engage with industry partners to facilitate adoption of the proposed research into practice. The close collaboration with industry will help train a new generation of scientists and engineers with interdisciplinary expertise. The skills and insights gained through this project will prepare them to tackle future challenges that lie at the intersection of multiple scientific fields, aligning with the NSF's mission to advance the frontiers of knowledge and innovation.

The project proposes to optically interconnect accelerators within compute servers using newly viable reconfigurable chip-to-chip optical interconnects. In contrast, today, commercial multi-accelerator compute servers that are workhorses of machine learning, use electrical interconnects to network accelerator chips in the server. However, recent trends show the prominence of an interconnect bandwidth wall caused by accelerator scaling at a magnitude faster rate than the bandwidth of the interconnect between accelerators in the same server.

This has led to under-utilization and idling of Graphical Processing Units (GPUs) resources in cloud datacenters. Therefore, it is important to scale interconnect bandwidth in multi-accelerator servers to keep power-hungry and expensive accelerators adequately fed with data and parameters. This project will use novel silicon photonics to create optical interconnections between accelerators within a server to meet this need.

This research will benefit the complementary efforts of hyper-scale cloud providers by unlocking customized multi-accelerator topologies that achieve bandwidth-optimal collective communication between accelerators during distributed machine learning and can minimize the blast radius of accelerator failures.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Cornell University

Advertisement
Apply for grants with GrantFunds
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant