Loading…

Loading grant details…

Active STUDENTSHIP UKRI Gateway to Research

Applications of Reinforcement Learning in Long Term Planning for Robotics


Funder Engineering and Physical Sciences Research Council
Recipient Organization University of Oxford
Country United Kingdom
Start Date Sep 30, 2023
End Date Sep 29, 2027
Duration 1,460 days
Number of Grantees 2
Roles Student; Supervisor
Data Source UKRI Gateway to Research
Grant ID 2868363
Grant Description

Research Context Whilst Reinforcement Learning (RL) has been shown to be effective when training occurs in environments identical or very similar to those in which agents are evaluated, the transfer of agent performance to unseen environments tends to be poor. Methods such as Unsupervised Environment Design (UED) have attempted to improve the generalisation ability of RL agents but

have only shown somewhat limited success. Similarly, whilst RL has been very successful at short timescale robot control tasks, longer term control tasks in environments with large state and action spaces have presented a much greater challenge, especially when trying to apply model-free approaches. Training agents using model-free policy gradient methods in these

contexts tend to require large amounts of computation, and fail to take advantage of prior environment knowledge that model-based approaches such as Monte Carlo Tree Search (MCTS) are able to use. However, online planning methods such as MCTS tend to require much greater computation at runtime than pre-trained policies.

Aims and Objectives The aim of this research is to examine whether it is possible to use RL methods to train policies that are capable of generalising to a wider scope of robotics environments and problem solving tasks, whilst also using long term planning methods to improve long term decision-making by

selecting optimal pre-trained policies based on a given environment state. Novelty of the research methodology Similar research has been done in relation to using policy selection rather than action selection in long term planning, such as with the use of 'options' in MCTS. However, these options tend to

be task-specific, sometimes hand-designed policies, and so fail to adapt to more diverse sets of environments. This research aims to train more general policies are more robust to different environments. Alignment to EPSRC's strategies and research areas This research aligns with the EPSRCS's Artificial Intelligence and Robotics research theme. This

research aims to use machine learning techniques to improve task planning in robotics environments.

All Grantees

University of Oxford

Advertisement
Discover thousands of grant opportunities
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant