Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | Portland State University |
| Country | United States |
| Start Date | Sep 01, 2024 |
| End Date | Aug 31, 2027 |
| Duration | 1,094 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2426339 |
Reinforcement Learning (RL) is a machine learning paradigm that strives to make optimal decision-making based on experience acting in an environment. In many cases, the "environment" refers to a simulator in the training stage and refers to the real world in the deployment stage. Training in the simulator brings a lot of advantages: lower cost, more safety, and more flexibility.
However, it is almost impossible to design a perfect simulator that is identical to the real world. Thus, a decision-maker trained in the simulator may not function well in the real world. The discrepancy between the simulator and the real world is called the simulation-to-reality (sim-to-real) gap.
This project will build new technologies to close the sim-to-real gap in both the training and the deployment stages. The research outcomes will benefit the development of next-generation RL techniques, which can improve the availability, applicability, and generalization of RL, and minimize the gap of RL between common practices and real-world practices.
This project proposes to close the sim-to-real gap in reinforcement learning by three mechanisms: randomization, alignment, and derivation. Specifically, 1) the randomization mechanism generates a set of homogeneous simulators by original simulator parameter randomization. The simulator set will cover a wider range of state-action regions than the original simulator, have a larger overlap with the real-world environment, and thereafter result in a smaller sim-to-real gap.
This mechanism is especially useful when the sim-to-real gap is large and the simulator is only accessible for training the simulator-optimal policy, but not accessible during the sim-to-real transfer process. 2) The alignment mechanism makes the simulator more like the real world during the transfer process. The alignment mechanism not only closes the sim-to-real gap but also is low-cost and high-efficiency, thus, accelerating the transfer process.
This mechanism is especially useful when the sim-to-real gap is relatively small and the simulator is accessible in both simulator-optimal policy training and sim-to-real transfer. 3) The derivation mechanism directly derives an optimal policy from real-world offline data without any simulator. It first estimates state-action values from offline data and then derives the policy by function approximation.
This mechanism is especially useful when offline data has been collected, but the real-world dynamics are unknown so it is unlikely to build a faithful simulator.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Portland State University
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant