TY - JOUR
T1 - Implications of stop-and-go traffic on training learning-based car-following control,☆
AU - Zhou, Anye
AU - Peeta, Srinivas
AU - Zhou, Hao
AU - Laval, Jorge
AU - Wang, Zejiang
AU - Cook, Adian
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2024
Y1 - 2024
N2 - Learning-based car-following control (LCC) of connected and autonomous vehicles (CAVs) is gaining significant attention with the advancement of computing power and data accessibility. While the flexibility and large model capacity of model-free architecture enable LCC to potentially outperform the model-based car-following (CF) model in improving traffic efficiency and mitigating congestion, the generalizability of LCC for traffic conditions different from the training environment/dataset is not well-understood. This study seeks to explore the impact of stop-and-go traffic in the training dataset on the generalizability of LCC. It uses the characteristics of lead vehicle trajectories to describe stop-and-go traffic, and links the theory of identifiability (i.e., obtaining a unique parameter estimation result using sensor measurements) to the generalizability of behavior cloning (BC) and policy-based deep reinforcement learning (DRL). Correspondingly, the study shows theoretically that: (i) stop-and-go traffic can enable the property of identifiability and enhance the control performance of BC-based LCC in different traffic conditions; (ii) stop-and-go traffic is not necessary for DRL-based LCC to generalize to different traffic conditions; (iii) DRL-based LCC trained with only constant-speed lead vehicle trajectories (not sufficient to ensure identifiability) can be generalized to different traffic conditions; and (iv) stop-and-go traffic increases variance in the training dataset, which improves the convergence of parameter estimation while negatively impacting the convergence of DRL to the optimal control policy. Numerical experiments validate the above findings, illustrating that BC-based LCC entails comprehensive training datasets for generalizing to different traffic conditions, while DRL-based LCC can achieve generalization with simple free-flow traffic training environments. This further suggests DRL as a more promising and cost-effective LCC approach to reduce operational costs, mitigate traffic congestion, and enhance safety and mobility, which can accelerate the deployment and acceptance of CAVs.
AB - Learning-based car-following control (LCC) of connected and autonomous vehicles (CAVs) is gaining significant attention with the advancement of computing power and data accessibility. While the flexibility and large model capacity of model-free architecture enable LCC to potentially outperform the model-based car-following (CF) model in improving traffic efficiency and mitigating congestion, the generalizability of LCC for traffic conditions different from the training environment/dataset is not well-understood. This study seeks to explore the impact of stop-and-go traffic in the training dataset on the generalizability of LCC. It uses the characteristics of lead vehicle trajectories to describe stop-and-go traffic, and links the theory of identifiability (i.e., obtaining a unique parameter estimation result using sensor measurements) to the generalizability of behavior cloning (BC) and policy-based deep reinforcement learning (DRL). Correspondingly, the study shows theoretically that: (i) stop-and-go traffic can enable the property of identifiability and enhance the control performance of BC-based LCC in different traffic conditions; (ii) stop-and-go traffic is not necessary for DRL-based LCC to generalize to different traffic conditions; (iii) DRL-based LCC trained with only constant-speed lead vehicle trajectories (not sufficient to ensure identifiability) can be generalized to different traffic conditions; and (iv) stop-and-go traffic increases variance in the training dataset, which improves the convergence of parameter estimation while negatively impacting the convergence of DRL to the optimal control policy. Numerical experiments validate the above findings, illustrating that BC-based LCC entails comprehensive training datasets for generalizing to different traffic conditions, while DRL-based LCC can achieve generalization with simple free-flow traffic training environments. This further suggests DRL as a more promising and cost-effective LCC approach to reduce operational costs, mitigate traffic congestion, and enhance safety and mobility, which can accelerate the deployment and acceptance of CAVs.
KW - Behavior cloning
KW - Car-following control
KW - Deep reinforcement learning
KW - Generalizability
KW - System identification
UR - http://www.scopus.com/inward/record.url?scp=85189820698&partnerID=8YFLogxK
U2 - 10.1016/j.trc.2024.104578
DO - 10.1016/j.trc.2024.104578
M3 - Article
AN - SCOPUS:85189820698
SN - 0968-090X
JO - Transportation Research Part C: Emerging Technologies
JF - Transportation Research Part C: Emerging Technologies
M1 - 104578
ER -