TY - JOUR
T1 - Comparative evaluation of deep learning workloads for leadership-class systems
AU - Yin, Junqi
AU - Tsaris, Aristeidis
AU - Dash, Sajal
AU - Miller, Ross
AU - Wang, Feiyi
AU - Shankar, Mallikarjun (Arjun)
N1 - Publisher Copyright:
© 2021 The Authors
PY - 2021/10
Y1 - 2021/10
N2 - Deep learning (DL) workloads and their performance at scale are becoming important factors to consider as we design, develop and deploy next-generation high-performance computing systems. Since DL applications rely heavily on DL frameworks and underlying compute (CPU/GPU) stacks, it is essential to gain a holistic understanding from compute kernels, models, and frameworks of popular DL stacks, and to assess their impact on science-driven, mission-critical applications. At Oak Ridge Leadership Computing Facility (OLCF), we employ a set of micro and macro DL benchmarks established through the Collaboration of Oak Ridge, Argonne, and Livermore (CORAL) to evaluate the AI readiness of our next-generation supercomputers. In this paper, we present our early observations and performance benchmark comparisons between the Nvidia V100 based Summit system with its CUDA stack and an AMD MI100 based testbed system with its ROCm stack. We take a layered perspective on DL benchmarking and point to opportunities for future optimizations in the technologies that we consider.
AB - Deep learning (DL) workloads and their performance at scale are becoming important factors to consider as we design, develop and deploy next-generation high-performance computing systems. Since DL applications rely heavily on DL frameworks and underlying compute (CPU/GPU) stacks, it is essential to gain a holistic understanding from compute kernels, models, and frameworks of popular DL stacks, and to assess their impact on science-driven, mission-critical applications. At Oak Ridge Leadership Computing Facility (OLCF), we employ a set of micro and macro DL benchmarks established through the Collaboration of Oak Ridge, Argonne, and Livermore (CORAL) to evaluate the AI readiness of our next-generation supercomputers. In this paper, we present our early observations and performance benchmark comparisons between the Nvidia V100 based Summit system with its CUDA stack and an AMD MI100 based testbed system with its ROCm stack. We take a layered perspective on DL benchmarking and point to opportunities for future optimizations in the technologies that we consider.
KW - CORAL benchmark
KW - Deep learning stack
KW - ROCm
UR - http://www.scopus.com/inward/record.url?scp=85147957012&partnerID=8YFLogxK
U2 - 10.1016/j.tbench.2021.100005
DO - 10.1016/j.tbench.2021.100005
M3 - Article
AN - SCOPUS:85147957012
SN - 2772-4859
VL - 1
JO - BenchCouncil Transactions on Benchmarks, Standards and Evaluations
JF - BenchCouncil Transactions on Benchmarks, Standards and Evaluations
IS - 1
M1 - 100005
ER -