Abstract
In this article, we address the reward-shaping problem of large-scale multiagent systems (MASs) using inverse reinforcement learning (IRL). The learning MAS does not have prior knowledge of the cost function of the target MAS and aims to reconstruct it based on the target's demonstrations. We propose a scalable model-free IRL algorithm for a large-scale MAS, where dynamic mode decomposition (DMD) extracts dynamic modes and builds a projection matrix. This significantly reduces the data required while retaining the system's essential dynamic information. The proofs of the algorithm's convergence, stability, and nonuniqueness of the state reward weight are presented. The efficacy of our method is validated with a large-scale consensus network, by comparing the required data sizes and computational time for reward shaping with and without DMD.
| Original language | English |
|---|---|
| Pages (from-to) | 687-699 |
| Number of pages | 13 |
| Journal | IEEE Transactions on Control of Network Systems |
| Volume | 12 |
| Issue number | 1 |
| DOIs | |
| State | Published - 2025 |
| Externally published | Yes |
Funding
This work was supported in part by the Army Research Office under Grant W911NF-20-1-0132.
Keywords
- Data-driven control
- dynamic mode decomposition (DMD)
- inverse reinforcement learning (IRL)
- large-scale system
- optimal control
Fingerprint
Dive into the research topics of 'Efficient Reward Shaping for Multiagent Systems'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver