Adversarial proximal policy optimisation for robust reinforcement learning

Date published

2024-01-04

Free to read from

Supervisor/s

Journal Title

Journal ISSN

Volume Title

Publisher

AIAA

Department

Type

Conference paper

ISSN

Format

Citation

Ince B, Shin HS, Tsourdos A. (2024) Adversarial proximal policy optimisation for robust reinforcement learning. In: AIAA SCITECH 2024 Forum, 8-12 January 2024, Orlando, USA. Paper number AIAA 2024-1697

Abstract

Robust reinforcement learning (RL) aims to develop algorithms that can effectively handle uncertainties and disturbances in the environment. Model-free methods play a crucial role in addressing these challenges by directly learning optimal policies without relying on a pre-existing model of the environment. This abstract provides an overview of model-free methods in robust RL, highlighting their key features, advantages, and recent advancements. Firstly, we discuss the fundamental concepts of RL and its challenges in uncertain environments. We then delve into model-free methods, which operate by interacting with the environment and collecting data to learn an optimal policy. These methods typically utilize value-based or policy-based approaches to estimate the optimal action-value function or the policy directly, respectively. To enhance robustness, model-free methods often incorporate techniques such as exploration-exploitation strategies, experience replay, and reward shaping. Exploration-exploitation strategies facilitate the exploration of uncertain regions of the environment, enabling the discovery of more robust policies. Experience replay helps improve sample efficiency by reusing past experiences, allowing the agent to learn from a diverse set of situations. Reward shaping techniques provide additional guidance to the RL agent, enabling it to focus on relevant features of the environment and mitigate potential uncertainties. In this paper, a robust reinforcement learning methodology is adapted utilising a novel Adversarial Proximal Policy Optimisation (A-PPO) method integrating an Adaptive KL penalty PPO. Comparison is made with DQN, DDQN and a conventional PPO algorithm.

Description

Software Description

Software Language

Github

Keywords

model-free, Airsim/Unreal Engine, robust reinforcement learning (RL), robust optimization

DOI

Rights

Attribution 4.0 International

Relationships

Relationships

Resources

Funder/s

This work is supported by Thales UK and EPSRC funding, grant number 2454266