Data-Driven Inverse Reinforcement Learning Control for Linear Multiplayer Games

Bosen Lian, Vrushabh S. Donge, Frank L. Lewis, Tianyou Chai, Ali Davoudi

Research output: Contribution to journalArticlepeer-review

22 Scopus citations

Abstract

This article proposes a data-driven inverse reinforcement learning (RL) control algorithm for nonzero-sum multiplayer games in linear continuous-time differential dynamical systems. The inverse RL problem in the games is solved by a learner reconstructing the unknown expert players' cost functions from demonstrated expert's optimal state and control input trajectories. The learner, thus, obtains the same control feedback gains and trajectories as the expert, only using data along system trajectories without knowing system dynamics. This article first proposes a model-based inverse RL policy iteration framework that has: 1) policy evaluation step for reconstructing cost matrices using Lyapunov functions; 2) state-reward weight improvement step using inverse optimal control (IOC); and 3) policy improvement step using optimal control. Based on the model-based policy iteration algorithm, this article further develops an online data-driven off-policy inverse RL algorithm without knowing any knowledge of system dynamics or expert control gains. Rigorous convergence and stability analysis of the algorithms are provided. It shows that the off-policy inverse RL algorithm guarantees unbiased solutions while probing noises are added to satisfy the persistence of excitation (PE) condition. Finally, two different simulation examples validate the effectiveness of the proposed algorithms.

Original languageEnglish
Pages (from-to)2028-2041
Number of pages14
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume35
Issue number2
DOIs
StatePublished - Feb 1 2024
Externally publishedYes

Keywords

  • Inverse optimal control (IOC)
  • inverse RL
  • nonzero-sum Nash games
  • off-policy
  • optimal control

Fingerprint

Dive into the research topics of 'Data-Driven Inverse Reinforcement Learning Control for Linear Multiplayer Games'. Together they form a unique fingerprint.

Cite this