TY - GEN
T1 - Inverse Reinforcement Learning Control for Linear Multiplayer Games
AU - Lian, Bosen
AU - Donge, Vrushabh S.
AU - Lewis, Frank L.
AU - Chai, Tianyou
AU - Davoudi, Ali
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - This paper proposes model-based and model-free inverse reinforcement learning (RL) control algorithms for multiplayer game systems described by linear continuous-time differential equations. Both algorithms find the learner the same optimal control policies and trajectories as the expert, by inferring the unknown expert players' cost functions from the expert's trajectories. This paper first discusses a model-based inverse RL policy iteration that consists of 1) policy evaluation for cost matrices using a Lyapunov equation, 2) state-reward weight improvement using inverse optimal control (IOC), and 3) policy improvement using optimal control. Based on the model-based algorithm, an online data-driven inverse RL algorithm is proposed without knowing system dynamics or expert control gains. Rigorous convergence and stability analysis of these algorithms are provided. Finally, a simulation example verifies our approach.
AB - This paper proposes model-based and model-free inverse reinforcement learning (RL) control algorithms for multiplayer game systems described by linear continuous-time differential equations. Both algorithms find the learner the same optimal control policies and trajectories as the expert, by inferring the unknown expert players' cost functions from the expert's trajectories. This paper first discusses a model-based inverse RL policy iteration that consists of 1) policy evaluation for cost matrices using a Lyapunov equation, 2) state-reward weight improvement using inverse optimal control (IOC), and 3) policy improvement using optimal control. Based on the model-based algorithm, an online data-driven inverse RL algorithm is proposed without knowing system dynamics or expert control gains. Rigorous convergence and stability analysis of these algorithms are provided. Finally, a simulation example verifies our approach.
UR - http://www.scopus.com/inward/record.url?scp=85146985023&partnerID=8YFLogxK
U2 - 10.1109/CDC51059.2022.9993367
DO - 10.1109/CDC51059.2022.9993367
M3 - Conference contribution
AN - SCOPUS:85146985023
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 2839
EP - 2844
BT - 2022 IEEE 61st Conference on Decision and Control, CDC 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 61st IEEE Conference on Decision and Control, CDC 2022
Y2 - 6 December 2022 through 9 December 2022
ER -