Loading…
Loading grant details…
| Funder | Engineering and Physical Sciences Research Council |
|---|---|
| Recipient Organization | University of Oxford |
| Country | United Kingdom |
| Start Date | Sep 30, 2023 |
| End Date | Sep 29, 2027 |
| Duration | 1,460 days |
| Number of Grantees | 2 |
| Roles | Student; Supervisor |
| Data Source | UKRI Gateway to Research |
| Grant ID | 2868363 |
Research Context Whilst Reinforcement Learning (RL) has been shown to be effective when training occurs in environments identical or very similar to those in which agents are evaluated, the transfer of agent performance to unseen environments tends to be poor. Methods such as Unsupervised Environment Design (UED) have attempted to improve the generalisation ability of RL agents but
have only shown somewhat limited success. Similarly, whilst RL has been very successful at short timescale robot control tasks, longer term control tasks in environments with large state and action spaces have presented a much greater challenge, especially when trying to apply model-free approaches. Training agents using model-free policy gradient methods in these
contexts tend to require large amounts of computation, and fail to take advantage of prior environment knowledge that model-based approaches such as Monte Carlo Tree Search (MCTS) are able to use. However, online planning methods such as MCTS tend to require much greater computation at runtime than pre-trained policies.
Aims and Objectives The aim of this research is to examine whether it is possible to use RL methods to train policies that are capable of generalising to a wider scope of robotics environments and problem solving tasks, whilst also using long term planning methods to improve long term decision-making by
selecting optimal pre-trained policies based on a given environment state. Novelty of the research methodology Similar research has been done in relation to using policy selection rather than action selection in long term planning, such as with the use of 'options' in MCTS. However, these options tend to
be task-specific, sometimes hand-designed policies, and so fail to adapt to more diverse sets of environments. This research aims to train more general policies are more robust to different environments. Alignment to EPSRC's strategies and research areas This research aligns with the EPSRCS's Artificial Intelligence and Robotics research theme. This
research aims to use machine learning techniques to improve task planning in robotics environments.
University of Oxford
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant