Loading…

Loading grant details…

Completed STANDARD GRANT National Science Foundation (US)

SHF:Small: Data-Driven Thermal Monitoring and Run-Time Management for Manycore Processor and Chiplet Designs

$5M USD

Funder National Science Foundation (US)
Recipient Organization University of California-Riverside
Country United States
Start Date Oct 01, 2021
End Date Sep 30, 2025
Duration 1,460 days
Number of Grantees 1
Roles Principal Investigator
Data Source National Science Foundation (US)
Grant ID 2113928
Grant Description

Today’s high-performance processors, and even emerging mobile platforms, are more thermally constrained than ever before due to continuing increase in on-chip power densities. Emerging Chiplet-based heterogeneous integration further exacerbates the thermal problems as heat dissipation is limited due to stacking integration. An increase in temperature exponentially degrades reliability of semiconductor chips and hence is one of the leading concerns today.

Furthermore, long-term reliability represents a significant challenge for the design of current nanometer integrated circuits (ICs). To address this trend, runtime power, thermal, resource and long-term reliability management schemes are being studied and implemented in most new generations of processors. However, there are still many challenging problems to be solved such as accurate full-chip run-time thermal and power estimation, workload-dependent true hot-spot detection and prediction, run-time control policy for true hot-spot reliability management, and more intelligent reliability-aware performance maximization in a thermally-constrained multi/many-core and emerging chiplet designs, to name a few.

At the same time, deep-learning-based on deep neural networks (DNN) are gaining significant traction, as they provide new computing and optimization paradigms for many of the challenging and complex design-automation problems. The new techniques developed in this project will make future VLSI chips more robust and reliable amid continued aggressive transistor scaling and increasing power density.

This project will also contribute significantly to the core knowledge and technologies of emerging machine learning based approaches for full-chip power, thermal modeling and runtime control and optimization techniques for multi/many-core processors. This award will enable the investigator to engage with more female and underrepresented minority students to further contribute to the diversity in US science and technology workforce.

This project explores a new generation of data-driven real-time thermal monitoring and smart run-time thermal/power and reliability management techniques by harnessing the latest advances in machine leaning and numerical methods for commercial many-core processors. First, the research will develop new data-driven fast online full-chip thermal- and power-monitoring techniques for commercial many-core processors, and emerging chiplet designs considering practical heat-sink cooling conditions under arbitrary workloads.

The project will explore recent advances in DNN networks such as recurrent neural networks (RNN), conditional generative neural networks (CGAN), graph neural networks (GNN) etc. Composable and scalable thermal modeling will also be explored for chiplet design. Second, this project will also explore learning-based thermal/power/reliability management for commercial many-core processors and chiplets based on the proposed DNN-based thermal/power/ reliability monitors considering practical control approaches.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

University of California-Riverside

Advertisement
Apply for grants with GrantFunds
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant