Abstract
We address the reward-shaping problem of large-scale multiagent systems (MASs) using inverse reinforcement learning (RL). The learning MAS does not have prior knowledge of the cost function of the target MAS, and aims to reconstruct it based on the target's demonstrations. We propose a scalable model-free inverse RL (IRL) algorithm for large-scale MAS, where dynamic mode decomposition (DMD) extracts dynamic modes and builds a projection matrix. This significantly reduces the data required while retaining the system's essential dynamic information. The proofs of the algorithm's convergence, stability, and non-uniqueness of the state-reward weight are presented. The efficacy of our method is validated with a large-scale consensus network, by comparing the required data sizes and computational time for reward-shaping with and without DMD.
Original language | English |
---|---|
Pages (from-to) | 1-12 |
Number of pages | 12 |
Journal | IEEE Transactions on Control of Network Systems |
DOIs | |
State | Accepted/In press - 2024 |
Externally published | Yes |
Keywords
- Artificial neural networks
- Control systems
- Data-driven control
- Dimensionality reduction
- Dynamic mode decomposition
- Heuristic algorithms
- Inverse reinforcement learning
- Large-scale system
- Network systems
- Optimal control
- Optimal control
- Stability criteria