Active STUDENTSHIP UKRI Gateway to Research

Safe Reinforcement Learning approaches for optimal control in systems with complex and stochastic dynamics.

Funder	Engineering and Physical Sciences Research Council
Recipient Organization	University of Oxford
Country	United Kingdom
Start Date	Sep 30, 2024
End Date	Mar 30, 2028
Duration	1,277 days
Number of Grantees	2
Roles	Student; Supervisor
Data Source	UKRI Gateway to Research
Grant ID	`2922634`

Grant Description

Autonomous systems are deployed globally in a wide range of applications, from assistive robotic

systems in healthcare through to automated instrumentation and control in process plants. The ever-increasing deluge of data and advancement of machine learning methods have led to such autonomous systems becoming more pervasive within society. As such, the need to design autonomous methods that can perform complex tasks in dynamic environments in a safe and controlled manner are of growing

importance. Reinforcement Learning (RL) is the main paradigm by which autonomous systems learn and operate. Within this paradigm, environments are mathematically modelled as Markov Decision Processes (MDPs) and RL algorithms act to synthesize policies that perform sequential decision making within those MDPs.

In more complex systems, a Partially Observable MDP (POMDP) is often used as the mathematical model for policy synthesis. This project aims to advance the field of safe RL in the context of control architectures for environments that exhibit complex and stochastic dynamics. Namely, this encompasses the formulation of policies that adhere to a set of restrictions over the duration of the

decision-making process, such that autonomous agents are able to behave safely whilst performing their duties. In particular, this project aims to answer the following research questions: 1. What is the impact of applying formal methods to Bayesian RL algorithms on agent performance in POMDPs? 2. Can probabilistic model verification be integrated into the architecture, to enable automatic

verification of whether the constraints have been satisfied? 3. Is it possible to apply such a framework to continuous state and/or continuous action environments which better represent real-world systems? Finding answers to these research questions should lead to the generation of ideas and models that can

be applied across a diverse range of applications to improve the feasibility and applicability of safe

autonomous agents in the real world. The novelty in the project stems from the integration of formal methods and Bayesian methods to the RL paradigm. Research in the application of Bayesian methods to RL have demonstrated that this is a powerful approach to improving agent performance in POMDPs. The integration of formal methods would

enable the restriction of synthesized policies to obey pre-specified constraints defined by temporal logic. Finally, the novel incorporation of probabilistic model verification into the architecture would facilitate the automatic checking that the synthesized policies do indeed follow the constraints described in a given

specification. This would result in a novel algorithmic framework from which autonomous agents can be trained to perform a given task safely, irrespective of the environment or task. This project falls within the following EPSRC research areas: Artificial Intelligence Technologies, Theoretical Computer Science, and Verification and Correctness. The project falls within the scope of the

"Artificial Intelligence Technologies" research area as it aims to develop new methodologies for RL, one of the fundamental machine learning paradigms for automated reasoning and planning. Further, the project

falls within the scope of the "Theoretical Computer Science" and "Verification and Correctness" research areas as it aims to integrate formal methods (linear temporal logic in particular) and probabilistic model

verification to establish a training framework that enforces the adherence of synthesized policies to prespecified constraints. This project is supported by industrial collaboration with Airbus, who are interested in the application of safe RL in the context of space and other aeronautical domains.

All Grantees

University of Oxford

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

Safe Reinforcement Learning approaches for optimal control in systems with complex and stochastic dynamics.

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants