Adversarial proximal policy optimisation for robust reinforcement learning

dc.contributor.authorInce, Bilkan
dc.contributor.authorShin, Hyo-Sang
dc.contributor.authorTsourdos, Antonios
dc.date.accessioned2024-06-06T12:33:01Z
dc.date.available2024-06-06T12:33:01Z
dc.date.issued2024-01-04
dc.description.abstractRobust reinforcement learning (RL) aims to develop algorithms that can effectively handle uncertainties and disturbances in the environment. Model-free methods play a crucial role in addressing these challenges by directly learning optimal policies without relying on a pre-existing model of the environment. This abstract provides an overview of model-free methods in robust RL, highlighting their key features, advantages, and recent advancements. Firstly, we discuss the fundamental concepts of RL and its challenges in uncertain environments. We then delve into model-free methods, which operate by interacting with the environment and collecting data to learn an optimal policy. These methods typically utilize value-based or policy-based approaches to estimate the optimal action-value function or the policy directly, respectively. To enhance robustness, model-free methods often incorporate techniques such as exploration-exploitation strategies, experience replay, and reward shaping. Exploration-exploitation strategies facilitate the exploration of uncertain regions of the environment, enabling the discovery of more robust policies. Experience replay helps improve sample efficiency by reusing past experiences, allowing the agent to learn from a diverse set of situations. Reward shaping techniques provide additional guidance to the RL agent, enabling it to focus on relevant features of the environment and mitigate potential uncertainties. In this paper, a robust reinforcement learning methodology is adapted utilising a novel Adversarial Proximal Policy Optimisation (A-PPO) method integrating an Adaptive KL penalty PPO. Comparison is made with DQN, DDQN and a conventional PPO algorithm.en_UK
dc.description.sponsorshipThis work is supported by Thales UK and EPSRC funding, grant number 2454266en_UK
dc.identifier.citationInce B, Shin HS, Tsourdos A. (2024) Adversarial proximal policy optimisation for robust reinforcement learning. In: AIAA SCITECH 2024 Forum, 8-12 January 2024, Orlando, USA. Paper number AIAA 2024-1697en_UK
dc.identifier.urihttps://doi.org/10.2514/6.2024-1697
dc.identifier.urihttps://dspace.lib.cranfield.ac.uk/handle/1826/21984
dc.language.isoen_UKen_UK
dc.publisherAIAAen_UK
dc.rightsAttribution 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.subjectmodel-freeen_UK
dc.subjectAirsim/Unreal Engineen_UK
dc.subjectrobust reinforcement learning (RL)en_UK
dc.subjectrobust optimizationen_UK
dc.titleAdversarial proximal policy optimisation for robust reinforcement learningen_UK
dc.typeConference paperen_UK

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Adversarial_proximal_policy_optimisation-2024.pdf
Size:
1.67 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.63 KB
Format:
Item-specific license agreed upon to submission
Description: