TY - GEN
T1 - On-policy Approximate Dynamic Programming for Optimal Control of non-linear systems
AU - Shalini, K.
AU - Vrushabh, D.
AU - Sonam, K.
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/6/29
Y1 - 2020/6/29
N2 - Optimal control theory deals with finding the policy that minimizes the discounted infinite horizon quadratic cost function. For finding the optimal control policy, the solution of the Hamilton-Jacobi-Bellman (HJB) equation must be found i.e. the value function which satisfies the Bellman equation. However, the HJB is a partial differential equation that is difficult to solve for a nonlinear system. The paper employs the approximate dynamic programming method to solve the HJB equation for the deterministic nonlinear discrete-time systems in continuous state and action space. The approximate solution of the HJB is found by the policy iteration algorithm which has the framework of actor-critic architecture. The control policy and value function are approximated using function approximators such as neural network represented in the form of linearly independent basis function. The gradient descent optimization algorithm is employed to tune the weights of the actor and critic network. The control algorithm is implemented for cart pole inverted pendulum system, the effectiveness of this approach is provided in simulations.
AB - Optimal control theory deals with finding the policy that minimizes the discounted infinite horizon quadratic cost function. For finding the optimal control policy, the solution of the Hamilton-Jacobi-Bellman (HJB) equation must be found i.e. the value function which satisfies the Bellman equation. However, the HJB is a partial differential equation that is difficult to solve for a nonlinear system. The paper employs the approximate dynamic programming method to solve the HJB equation for the deterministic nonlinear discrete-time systems in continuous state and action space. The approximate solution of the HJB is found by the policy iteration algorithm which has the framework of actor-critic architecture. The control policy and value function are approximated using function approximators such as neural network represented in the form of linearly independent basis function. The gradient descent optimization algorithm is employed to tune the weights of the actor and critic network. The control algorithm is implemented for cart pole inverted pendulum system, the effectiveness of this approach is provided in simulations.
KW - Approximate Dynamic Programming (ADP)
KW - Gradient Descent
KW - Hamilton-Jacobi-Bellman (HJB)
KW - Optimal Control
UR - http://www.scopus.com/inward/record.url?scp=85098240206&partnerID=8YFLogxK
U2 - 10.1109/CoDIT49905.2020.9263879
DO - 10.1109/CoDIT49905.2020.9263879
M3 - Conference contribution
AN - SCOPUS:85098240206
T3 - 7th International Conference on Control, Decision and Information Technologies, CoDIT 2020
SP - 1058
EP - 1062
BT - 7th International Conference on Control, Decision and Information Technologies, CoDIT 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 7th International Conference on Control, Decision and Information Technologies, CoDIT 2020
Y2 - 29 June 2020 through 2 July 2020
ER -