Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | Stanford University |
| Country | United States |
| Start Date | Mar 15, 2023 |
| End Date | Feb 29, 2028 |
| Duration | 1,812 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2238006 |
Creating energy-efficient computing systems is essential for achieving the societal goal of sustainability. As the semiconductor technology scaling predicted by Moore’s law slows down, domain-specific hardware accelerators, i.e., hardware blocks specialized to do certain tasks very well, will play an increasingly important role in improving the performance and energy-efficiency of computing systems.
Modern mobile chips have dozens of accelerators for applications such as image processing, video coding, graphics, neural networks etc. to achieve low power consumption and fast processing speeds. However, with advances in machine learning, these applications are changing at a rapid pace. Maintaining high performance and energy-efficiency requires that hardware accelerators, compilers, and applications evolve together in lockstep.
Unfortunately, existing methodologies to achieve this involve significant manual effort. Large engineering teams study the accelerator architecture in detail and modify the compiler in an ad hoc manner, leveraging low-level libraries to target specific hardware features. Because of the large overhead in maintaining the software stack, it remains challenging to accelerate new domains or existing domains as they evolve.
What is needed is a structured approach for generating programmable accelerators and for updating the software compiler as the accelerator architecture evolves with the applications. This project proposes a design-space exploration and optimization framework that automatically generates accelerator architectures that approach the efficiencies of hand-designed ones, with a significantly lower design effort for both hardware and compiler generation.
This work can impact how hardware-software system design is done today in the industry, by reducing the time to market for products and creating more productive design teams. Moreover, the openly shared curriculum developed as a part of this work will ensure equitable access to educational opportunities and help create a diverse, globally competitive semiconductor workforce.
The research goal of this project is to create a framework for automated co-design and optimization of domain-specific hardware accelerators and compilers. The framework will have three components: (1) an automated accelerator processing element (PE) design space optimization tool based on frequent subgraph mining and merging, (2) an accelerator memory element optimization tool for both dense and sparse applications, and (3) an auto-scheduler for automatically determining the best mapping of an application to the accelerator.
These tools will be used to design, optimize and prototype in silicon a unified programmable accelerator for both dense application domains such as image processing and machine learning and sparse application domains such as graph analytics, and demonstrate energy-efficiency and performance metrics that significantly beat general purpose architectures and approach application-specific integrated circuits. The proposed approach uses several techniques, distinct from prior work, to achieve automatic accelerator-compiler co-design and optimization.
First, it allows any change in the hardware specification to automatically propagate into the compiler with no manual effort. This unique property is the key to enabling large-scale design space exploration of accelerators. Without it, one would have to manually update the application compiler with every hardware change, greatly limiting the number of design points one can explore.
Second, the proposed framework for automated PE optimization for accelerators generates efficient PE architectures from the application graphs themselves, using frequent subgraph mining and merging. This approach is quite different from prior work, which does not perform application-driven optimization but rather searches over many possible PE parameter values.
As a result, this approach promises to be much more sample-efficient and faster versus prior work. Finally, as opposed to existing commercial high level synthesis tools and compilers for programmable accelerators which require the user to provide low-level scheduling directives in the application code, the auto-scheduler proposed will automatically search for the best mapping of an application to the programmable accelerator, greatly improving user productivity.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Stanford University
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant