TY - GEN
T1 - Quantitative Evaluation of Autonomous Driving in CARLA
AU - Gao, Shang
AU - Paulissen, Spencer
AU - Coletti, Mark
AU - Patton, Robert
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - There have been many recent advancements in imitation and reinforcement learning for autonomous driving, but existing metrics generally lack the means to capture a wide range of driving behaviors and compare the severity of different failure cases. To address this shortcoming, we introduce Quan-titative Evaluation for Driving (QED), which assesses different aspects of driving behavior including the ability to stay in the center of the lane, avoid weaving and erratic behavior, follow the speed limit, and avoid collisions. We compare scores generated by QED against scores assigned by human evaluators on 30 different drivers and 6 different towns in the CARLA driving simulator. In "easy"evaluation scenarios where better drivers are easily distinguished from worse drivers, QED attains 0.96 Pearson correlation and 0.97 Spearman correlation with human evaluators, similar to the baseline inter-human-evaluator 0.96 Pearson correlation and 0.95 Spearman correlation. In "hard"evaluation scenarios where ranking drivers is more ambiguous, QED attains 0.84 Pearson correlation and 0.74 Spearman correlation with human evaluators, slighter higher than the baseline inter-human-evaluator 0.78 Pearson correlation and 0.7 Spearman correlation. While QED may not capture every characteristic that defines good driving, we consider it an important foundation for reproducibility and standardization in the community.
AB - There have been many recent advancements in imitation and reinforcement learning for autonomous driving, but existing metrics generally lack the means to capture a wide range of driving behaviors and compare the severity of different failure cases. To address this shortcoming, we introduce Quan-titative Evaluation for Driving (QED), which assesses different aspects of driving behavior including the ability to stay in the center of the lane, avoid weaving and erratic behavior, follow the speed limit, and avoid collisions. We compare scores generated by QED against scores assigned by human evaluators on 30 different drivers and 6 different towns in the CARLA driving simulator. In "easy"evaluation scenarios where better drivers are easily distinguished from worse drivers, QED attains 0.96 Pearson correlation and 0.97 Spearman correlation with human evaluators, similar to the baseline inter-human-evaluator 0.96 Pearson correlation and 0.95 Spearman correlation. In "hard"evaluation scenarios where ranking drivers is more ambiguous, QED attains 0.84 Pearson correlation and 0.74 Spearman correlation with human evaluators, slighter higher than the baseline inter-human-evaluator 0.78 Pearson correlation and 0.7 Spearman correlation. While QED may not capture every characteristic that defines good driving, we consider it an important foundation for reproducibility and standardization in the community.
UR - http://www.scopus.com/inward/record.url?scp=85124982233&partnerID=8YFLogxK
U2 - 10.1109/IVWorkshops54471.2021.9669240
DO - 10.1109/IVWorkshops54471.2021.9669240
M3 - Conference contribution
AN - SCOPUS:85124982233
T3 - IEEE Intelligent Vehicles Symposium, Proceedings
SP - 257
EP - 263
BT - 2021 IEEE Intelligent Vehicles Symposium Workshops, IV Workshops 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 32nd IEEE Intelligent Vehicles Symposium Workshops, IV Workshops 2021
Y2 - 11 July 2021 through 17 July 2021
ER -