Inverse Reinforcement Learning Control for Linear Multiplayer Games

Bosen Lian, Vrushabh S. Donge, Frank L. Lewis, Tianyou Chai, Ali Davoudi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

This paper proposes model-based and model-free inverse reinforcement learning (RL) control algorithms for multiplayer game systems described by linear continuous-time differential equations. Both algorithms find the learner the same optimal control policies and trajectories as the expert, by inferring the unknown expert players' cost functions from the expert's trajectories. This paper first discusses a model-based inverse RL policy iteration that consists of 1) policy evaluation for cost matrices using a Lyapunov equation, 2) state-reward weight improvement using inverse optimal control (IOC), and 3) policy improvement using optimal control. Based on the model-based algorithm, an online data-driven inverse RL algorithm is proposed without knowing system dynamics or expert control gains. Rigorous convergence and stability analysis of these algorithms are provided. Finally, a simulation example verifies our approach.

Original languageEnglish
Title of host publication2022 IEEE 61st Conference on Decision and Control, CDC 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2839-2844
Number of pages6
ISBN (Electronic)9781665467612
DOIs
StatePublished - 2022
Externally publishedYes
Event61st IEEE Conference on Decision and Control, CDC 2022 - Cancun, Mexico
Duration: Dec 6 2022Dec 9 2022

Publication series

NameProceedings of the IEEE Conference on Decision and Control
Volume2022-December
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370

Conference

Conference61st IEEE Conference on Decision and Control, CDC 2022
Country/TerritoryMexico
CityCancun
Period12/6/2212/9/22

Fingerprint

Dive into the research topics of 'Inverse Reinforcement Learning Control for Linear Multiplayer Games'. Together they form a unique fingerprint.

Cite this