Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | University of California-Los Angeles |
| Country | United States |
| Start Date | Jan 01, 2024 |
| End Date | Dec 31, 2026 |
| Duration | 1,095 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2328974 |
In traditional Von Neumann computing systems, a significant bottleneck arises because the data transfer speed to and from the computing units has considerably fallen behind capacity, processing speed, and efficiency. To mitigate this bottleneck by bridging the gap between storage and computation, many innovative storage technologies have been introduced, along with near- and in-memory processing solutions designed for both emerging and traditional memory systems.
Nonetheless, a considerable challenge remains: the prototyping and characterization of actual fabricated systems, especially those encompassing both mature technologies and cutting-edge technologies. To overcome this challenge, this project develops a cutting-edge Retunable and Reconfigurable Acceleration Platform (R3AP) based on emerging racetrack memory, leveraging a device-architecture-application co-design approach.
The standout features of R3AP include its ability to function as a reconfigurable logic, a processing-in-memory (PIM) accelerator, and a high-density memory storage. It is retunable, meaning it can operate with bit-wise, integer, and floating-point precision, and can simulate analog-like storage and processing. R3AP effectively mitigates data movement inefficiencies while offering domain-specific acceleration and adaptability.
With its dense, reliable, energy-efficient, and ultra-low latency computational capability, R3AP has the potential to revolutionize the storage and processing capabilities of future computing systems, such as those in Internet of Things (IoT) and Cyber-Physical Systems (CPS). It can also be applied to high-performance and cloud computing systems. The project's findings are shared through publications, workshops, design contests, tutorials, industrial courses, and technology transfer activities.
Educational resources and outreach activity plans are made available on the project website, and software artifacts are released on GitHub.
To realize R3AP, the project comprises a series of interrelated research tasks spanning multiple system layers. At the device level, the project integrates the voltage-controlled skyrmion motion mechanism with the industrial-grade 8-inch wafer magnetic tunneling junction stack and demonstrates a fully functional Skyrmion racetrack memory (SRTM), including the formation, shifting, and detection of the skyrmion stream.
Additionally, it evaluates the performance of SRTM, focusing on aspects such as write-error-rate, shift-error-rate, read-error-rate, operation speed, and energy consumption. It also addresses and mitigates non-idealities, such as the pinning effect, and goes on to develop and demonstrate CMOS-integrated SRTM. On the architecture and circuit layers, the project involves the creation of a mutable lookup table, compute, and memory unit.
This unit performs like multi-context Field-Programmable Gate Array (FPGA) logic, parallel PIM logic, massively parallel accumulators, and analog-like storage and compute structures, leveraging the unique properties of SRTM. This layer ensures high-speed memory access from a hierarchy consisting of banks, subarrays, tiles, etc., and further adds links via configurable switch boxes and a mesh-based network-on-chip to enable data movement operations for PIM that would otherwise be challenging.
At the application layer, the project develops novel modeling, analysis, design space exploration, and runtime adjustment techniques to exploit the high degree of reconfigurability provided by R3AP. The goal is to adapt future IoT and CPS applications to changing environments and requirements, optimize resource usage, withstand external disturbances, and enhance overall system performance, resilience, and sustainability.
Across all these layers, the project develops a scalable computer-aided design (CAD) flow. This involves a multi-level intermediate representation-based compilation flow, which can compile high-level description languages such as PyTorch and C/C++ into binaries for the R3AP device. This flow uses a multi-level hierarchy including front-end, middle-end, and back-end compilation of the designs, and abstracts various optimization and management problems to a suitable level for efficient resolution.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
University of California-Los Angeles
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant