Browsing by Author "Yang, Lei"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item Open Access Demand and capacity balancing technology based on multi-agent reinforcement learning(IEEE, 2021-11-15) Chen, Yutong; Xu, Yan; Hu, Minghua; Yang, LeiTo effectively solve Demand and Capacity Balancing (DCB) in large-scale and high-density scenarios through the Ground Delay Program (GDP) in the pre-tactical stage, a sequential decision-making framework based on a time window is proposed. On this basis, the problem is transformed into Markov Decision Process (MDP) based on local observation, and then Multi-Agent Reinforcement Learning (MARL) method is adopted. Each flight is regarded as an independent agent to decide whether to implement GDP according to its local state observation. By designing the reward function in multiple combinations, a Mixed Competition and Cooperation (MCC) mode considering fairness is formed among agents. To improve the efficiency of MARL, we use the double Q-Learning Network (DQN), experience replay technology, adaptive ϵ-greedy strategy and Decentralized Training with Decentralized Execution (DTDE) framework. The experimental results show that the training process of the MARL method is convergent, efficient and stable. Compared with the Computer-Assisted Slot Allocation (CASA) method used in the actual operation, the number of flight delays and the average delay time is reduced by 33.7% and 36.7% respectively.Item Open Access General multi-agent reinforcement learning integrating adaptive manoeuvre strategy for real-time multi-aircraft conflict resolution(Elsevier, 2023-04-12) Chen, Yutong; Hu, Minghua; Yang, Lei; Xu, Yan; Xie, HuaReinforcement learning (RL) techniques are under investigation for resolving conflict in air traffic management (ATM), exploiting their computational capabilities and ability to cope with flight uncertainty. However, the limitations of generalisation make it difficult for existing RL-based conflict resolution (CR) methods to be effective in practice. This paper proposes a general multi-agent reinforcement learning (MARL) method that integrates an adaptive manoeuvre strategy to enhance both the solution’s efficiency and the model’s generalisation in multi-aircraft conflict resolution (MACR). A partial observation approach based on the imminent threat detection sectors is used to gather critical environmental information, enabling the model to be applied in arbitrary scenarios. Agents are trained to provide the correct flight intention (such as increasing speed and yawing to the left), while an adaptive manoeuvre strategy generates the specific manoeuvre (speed and heading parameters) based on the flight intention. To address flight uncertainty and performance challenges caused by the intrinsic non-stationarity in MARL, a warning area for each aircraft is introduced. We employ a state-of-the-art Deep Q-learning Network (DQN) method, Rainbow DQN, to improve the efficiency of the RL algorithm. The multi-agent system is trained and deployed in a distributed manner to adapt to real-world scenarios. A sensitivity analysis of uncertainty levels and warning area sizes is conducted to explore their impact on the proposed method. Simulation experiments confirm the effectiveness of the training and generalisation of the proposed method.Item Open Access General real-time three-dimensional multi-aircraft conflict resolution method using multi-agent reinforcement learning(Elsevier, 2023-10-10) Chen, Yutong; Xu, Yan; Yang, Lei; Hu, MinghuaReinforcement learning (RL) techniques have been studied for solving the conflict resolution (CR) problem in air traffic management, leveraging their potential for computation and ability to handle uncertainty. However, challenges remain that impede the application of RL methods to CR in practice, including three-dimensional manoeuvres, generalisation, trajectory recovery, and success rate. This paper proposes a general multi-agent reinforcement learning approach for real-time three-dimensional multi-aircraft conflict resolution, in which agents share a neural network and are deployed on each aircraft to form a distributed decision-making system. To address the challenges, several technologies are introduced, including a partial observation model based on imminent threats for generalisation, a safety separation relaxation model for multiple flight levels for three-dimensional manoeuvres, an adaptive manoeuvre strategy for trajectory recovery, and a conflict buffer model for success rate. The Rainbow Deep Q-learning Network (DQN) is used to enhance the efficiency of the RL process. A simulation environment that considers flight uncertainty (resulting from mechanical and navigation errors and wind) is constructed to train and evaluate the proposed approach. The experimental results demonstrate that the proposed method can resolve conflicts in scenarios with much higher traffic density than in today’s real-world situations.Item Open Access A hybrid motion planning framework for autonomous driving in mixed traffic flow(Elsevier, 2022-11-28) Yang, Lei; Lu, Chao; Xiong, Guangming; Xing, Yang; Gong, JianweiAs a core part of an autonomous driving system, motion planning plays an important role in safe driving. However, traditional model- and rule-based methods lack the ability to learn interactively with the environment, and learning-based methods still have problems in terms of reliability. To overcome these problems, a hybrid motion planning framework (HMPF) is proposed to improve the performance of motion planning, which is composed of learning-based behavior planning and optimization-based trajectory planning. The behavior planning module adopts a deep reinforcement learning (DRL) algorithm, which can learn from the interaction between the ego vehicle (EV) and other human-driven vehicles (HDVs), and generate behavior decision commands based on environmental perception information. In particular, the intelligent driver model (IDM) calibrated based on real driving data is used to drive HDVs to imitate human driving behavior and interactive response, so as to simulate the bidirectional interaction between EV and HDVs. Meanwhile, trajectory planning module adopts the optimization method based on road Frenet coordinates, which can generate safe and comfortable desired trajectory while reducing the solution dimension of the problem. In addition, trajectory planning also exists as a safety hard constraint of behavior planning to ensure the feasibility of decision instruction. The experimental results demonstrate the effectiveness and feasibility of the proposed HMPF for autonomous driving motion planning in urban mixed traffic flow scenarios.Item Open Access Locally generalised multi-agent reinforcement learning for demand and capacity balancing with customised neural networks(Elsevier, 2023-04-18) Chen, Yutong; Hu, Minghua; Xu, Yan; Yang, LeiReinforcement Learning (RL) techniques are being studied to solve the Demand and Capacity Balancing (DCB) problems to fully exploit their computational performance. A locally generalised Multi-Agent Reinforcement Learning (MARL) for real-world DCB problems is proposed. The proposed method can deploy trained agents directly to unseen scenarios in a specific Air Traffic Flow Management (ATFM) region to quickly obtain a satisfactory solution. In this method, agents of all flights in a scenario form a multi-agent decision-making system based on partial observation. The trained agent with the customised neural network can be deployed directly on the corresponding flight, allowing it to solve the DCB problem jointly. A cooperation coefficient is introduced in the reward function, which is used to adjust the agent’s cooperation preference in a multi-agent system, thereby controlling the distribution of flight delay time allocation. A multi-iteration mechanism is designed for the DCB decision-making framework to deal with problems arising from non-stationarity in MARL and to ensure that all hotspots are eliminated. Experiments based on large-scale high-complexity real-world scenarios are conducted to verify the effectiveness and efficiency of the method. From a statistical point of view, it is proven that the proposed method is generalised within the scope of the flights and sectors of interest, and its optimisation performance outperforms the standard computer-assisted slot allocation and state-of-the-art RL-based DCB methods. The sensitivity analysis preliminarily reveals the effect of the cooperation coefficient on delay time allocation.