Constrained Deep Reinforcement Learning for Energy Sustainable Multi-UAV Based Random Access IoT Networks with NOMA

Sami Khairy, Prasanna Balaprakash, Lin X. Cai, Yu Cheng

Research output: Contribution to journalArticlepeer-review

68 Scopus citations

Abstract

In this paper, we apply the Non-Orthogonal Multiple Access (NOMA) technique to improve the massive channel access of a wireless IoT network where solar-powered Unmanned Aerial Vehicles (UAVs) relay data from IoT devices to remote servers. Specifically, IoT devices contend for accessing the shared wireless channel using an adaptive p-persistent slotted Aloha protocol; and the solar-powered UAVs adopt Successive Interference Cancellation (SIC) to decode multiple received data from IoT devices to improve access efficiency. To enable an energy-sustainable capacity-optimal network, we study the joint problem of dynamic multi-UAV altitude control and multi-cell wireless channel access management of IoT devices as a stochastic control problem with multiple energy constraints. We first formulate this problem as a Constrained Markov Decision Process (CMDP), and propose an online model-free Constrained Deep Reinforcement Learning (CDRL) algorithm based on Lagrangian primal-dual policy optimization to solve the CMDP. Extensive simulations demonstrate that our proposed algorithm learns a cooperative policy in which the altitude of UAVs and channel access probability of IoT devices are dynamically controlled to attain the maximal long-term network capacity while ensuring energy sustainability of UAVs, outperforming baseline schemes. The proposed CDRL agent can be trained on a small network, yet the learned policy can efficiently manage networks with a massive number of IoT devices and varying initial states, which can amortize the cost of training the CDRL agent.

Original languageEnglish
Article number9177252
Pages (from-to)1101-1115
Number of pages15
JournalIEEE Journal on Selected Areas in Communications
Volume39
Issue number4
DOIs
StatePublished - Apr 2021
Externally publishedYes

Funding

Manuscript received January 31, 2020; revised June 7, 2020; accepted July 19, 2020. Date of publication August 25, 2020; date of current version March 17, 2021. This work was supported in part by the NSF under Grant ECCS-1554576, Grant ECCS-1610874, and Grant CNS-1816908; and in part by the U.S. Department of Energy, Office of Science, under Contract DE-AC02-06CH11357. (Corresponding author: Sami Khairy.) Sami Khairy, Lin X. Cai, and Yu Cheng are with the Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL 60616 USA (e-mail: [email protected]; [email protected]; [email protected]).

Keywords

  • Constrained deep reinforcement learning
  • UAV altitude control
  • energy sustainable IoT networks
  • non-orthogonal multiple access
  • p-persistent slotted Aloha
  • solar-powered UAVs

Fingerprint

Dive into the research topics of 'Constrained Deep Reinforcement Learning for Energy Sustainable Multi-UAV Based Random Access IoT Networks with NOMA'. Together they form a unique fingerprint.

Cite this