Cost inference of discrete-time linear quadratic control policies using human-behaviour learning

Date published

2022-06-30

Free to read from

Supervisor/s

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE

Department

Type

Conference paper

ISSN

2576-3547

Format

Citation

Perrusquia A, Guo W. (2022) Cost inference of discrete-time linear quadratic control policies using human-behaviour learning. In: CODiT 2022: 8th International Conference on Control, Decision and Information Technologies, 17-20 May 2022, Istanbul, Turkey, pp. 165-170

Abstract

In this paper, a cost inference algorithm for discrete-time systems using human-behaviour learning is pro-posed. The approach is inspired in the complementary learning that exhibits the neocortex, hippocampus, and striatum learning systems to achieve complex decision making. The main objective is to infer the hidden cost function from expert's data associated to the hippocampus (off-policy data) and transfer it to the neocortex for policy generalization (on-policy data) in different systems and environments. The neocortex is modelled by a Q-learning and a least-squares identification algorithms for on-policy learning and system identification. The cost inference is obtained using a one-step gradient descent rule and an inverse optimal control algorithm. Convergence of the cost inference algorithm is discussed using Lyapunov recursions. Simulations verify the effectiveness of the approach.

Description

Software Description

Software Language

Github

Keywords

Learning systems, Costs, Q-learning, Decision making, Optimal control, Cost function, Inference algorithms

DOI

Rights

Attribution-NonCommercial 4.0 International

Relationships

Relationships

Supplements

Funder/s