Adversarial proximal policy optimisation for robust reinforcement learning

Ince, Bilkan; Shin, Hyo-Sang; Tsourdos, Antonios

Adversarial proximal policy optimisation for robust reinforcement learning

Files

Adversarial_proximal_policy_optimisation-2024.pdf (1.67 MB)

Date published

2024-01-04

Authors

Ince, Bilkan
Shin, Hyo-Sang
Tsourdos, Antonios

Publisher

AIAA

Type

Conference paper

URI

https://doi.org/10.2514/6.2024-1697
https://dspace.lib.cranfield.ac.uk/handle/1826/21984

Citation

Ince B, Shin HS, Tsourdos A. (2024) Adversarial proximal policy optimisation for robust reinforcement learning. In: AIAA SCITECH 2024 Forum, 8-12 January 2024, Orlando, USA. Paper number AIAA 2024-1697

Abstract

Robust reinforcement learning (RL) aims to develop algorithms that can effectively handle uncertainties and disturbances in the environment. Model-free methods play a crucial role in addressing these challenges by directly learning optimal policies without relying on a pre-existing model of the environment. This abstract provides an overview of model-free methods in robust RL, highlighting their key features, advantages, and recent advancements. Firstly, we discuss the fundamental concepts of RL and its challenges in uncertain environments. We then delve into model-free methods, which operate by interacting with the environment and collecting data to learn an optimal policy. These methods typically utilize value-based or policy-based approaches to estimate the optimal action-value function or the policy directly, respectively. To enhance robustness, model-free methods often incorporate techniques such as exploration-exploitation strategies, experience replay, and reward shaping. Exploration-exploitation strategies facilitate the exploration of uncertain regions of the environment, enabling the discovery of more robust policies. Experience replay helps improve sample efficiency by reusing past experiences, allowing the agent to learn from a diverse set of situations. Reward shaping techniques provide additional guidance to the RL agent, enabling it to focus on relevant features of the environment and mitigate potential uncertainties. In this paper, a robust reinforcement learning methodology is adapted utilising a novel Adversarial Proximal Policy Optimisation (A-PPO) method integrating an Adaptive KL penalty PPO. Comparison is made with DQN, DDQN and a conventional PPO algorithm.

Keywords

model-free, Airsim/Unreal Engine, robust reinforcement learning (RL), robust optimization

Rights

Attribution 4.0 International

http://creativecommons.org/licenses/by/4.0/

Funder/s

This work is supported by Thales UK and EPSRC funding, grant number 2454266

Collections

Staff publications (SATM)

Full item page

Adversarial proximal policy optimisation for robust reinforcement learning

Files

Date published

Free to read from

Authors

Supervisor/s

Journal Title

Journal ISSN

Volume Title

Publisher

Department

Type

ISSN

Format

URI

Citation

Abstract

Description

Software Description

Software Language

Github

Keywords

DOI

Rights

Relationships

Relationships

Resources

Funder/s

Collections