Adversarial proximal policy optimisation for robust reinforcement learning

Ince, Bilkan; Shin, Hyo-Sang; Tsourdos, Antonios

Adversarial proximal policy optimisation for robust reinforcement learning

dc.contributor.author	Ince, Bilkan
dc.contributor.author	Shin, Hyo-Sang
dc.contributor.author	Tsourdos, Antonios
dc.date.accessioned	2024-06-06T12:33:01Z
dc.date.available	2024-06-06T12:33:01Z
dc.date.issued	2024-01-04
dc.description.abstract	Robust reinforcement learning (RL) aims to develop algorithms that can effectively handle uncertainties and disturbances in the environment. Model-free methods play a crucial role in addressing these challenges by directly learning optimal policies without relying on a pre-existing model of the environment. This abstract provides an overview of model-free methods in robust RL, highlighting their key features, advantages, and recent advancements. Firstly, we discuss the fundamental concepts of RL and its challenges in uncertain environments. We then delve into model-free methods, which operate by interacting with the environment and collecting data to learn an optimal policy. These methods typically utilize value-based or policy-based approaches to estimate the optimal action-value function or the policy directly, respectively. To enhance robustness, model-free methods often incorporate techniques such as exploration-exploitation strategies, experience replay, and reward shaping. Exploration-exploitation strategies facilitate the exploration of uncertain regions of the environment, enabling the discovery of more robust policies. Experience replay helps improve sample efficiency by reusing past experiences, allowing the agent to learn from a diverse set of situations. Reward shaping techniques provide additional guidance to the RL agent, enabling it to focus on relevant features of the environment and mitigate potential uncertainties. In this paper, a robust reinforcement learning methodology is adapted utilising a novel Adversarial Proximal Policy Optimisation (A-PPO) method integrating an Adaptive KL penalty PPO. Comparison is made with DQN, DDQN and a conventional PPO algorithm.	en_UK
dc.description.sponsorship	This work is supported by Thales UK and EPSRC funding, grant number 2454266	en_UK
dc.identifier.citation	Ince B, Shin HS, Tsourdos A. (2024) Adversarial proximal policy optimisation for robust reinforcement learning. In: AIAA SCITECH 2024 Forum, 8-12 January 2024, Orlando, USA. Paper number AIAA 2024-1697	en_UK
dc.identifier.uri	https://doi.org/10.2514/6.2024-1697
dc.identifier.uri	https://dspace.lib.cranfield.ac.uk/handle/1826/21984
dc.language.iso	en_UK	en_UK
dc.publisher	AIAA	en_UK
dc.rights	Attribution 4.0 International	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	*
dc.subject	model-free	en_UK
dc.subject	Airsim/Unreal Engine	en_UK
dc.subject	robust reinforcement learning (RL)	en_UK
dc.subject	robust optimization	en_UK
dc.title	Adversarial proximal policy optimisation for robust reinforcement learning	en_UK
dc.type	Conference paper	en_UK

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Adversarial_proximal_policy_optimisation-2024.pdf
Size:: 1.67 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.63 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Staff publications (SATM)