TY - JOUR
T1 - Early experiences on Summit
T2 - Data analytics and AI applications
AU - Womble, D. E.
AU - Shankar, M.
AU - Joubert, W.
AU - Johnston, J. T.
AU - Wells, J. C.
AU - Nichols, J. A.
N1 - Publisher Copyright:
© 1957-2012 IBM.
PY - 2019/11/1
Y1 - 2019/11/1
N2 - Oak Ridge National Laboratory (ORNL) installed the Summit supercomputer in 2018. Summit is an accelerated-node architecture with 4,608 nodes, each with two IBM P9 and six NVIDIA Volta V100 GPU processors, significant DRAM footprint, robust HBM quantities supporting the GPUs, nonvolatile memory, and fast NVLink and Infiniband interconnects. This machine was designed to deliver over 200 peak double-precision petaflops for scientific modeling and simulation applications and over 3 peak reduced-precision ExaOps. Summit features impact application performance depending on whether the codes are simulation-oriented, write-intensive, data-analysis-oriented, read-intensive, or communication-intensive codes. In the context of artificial intelligence (AI) and machine learning (ML), these features support data-intensive applications that infer and predict statistical relationships in complex datasets. This article presents recent experiences at ORNL using Summit for applications in AI and ML and describes example code and algorithmic changes necessary to use Summit effectively. Finally, this article discusses research directions in scalable ML, including, algorithms research and combining data analysis with modeling and simulation in an accelerated-node, exascale environment.
AB - Oak Ridge National Laboratory (ORNL) installed the Summit supercomputer in 2018. Summit is an accelerated-node architecture with 4,608 nodes, each with two IBM P9 and six NVIDIA Volta V100 GPU processors, significant DRAM footprint, robust HBM quantities supporting the GPUs, nonvolatile memory, and fast NVLink and Infiniband interconnects. This machine was designed to deliver over 200 peak double-precision petaflops for scientific modeling and simulation applications and over 3 peak reduced-precision ExaOps. Summit features impact application performance depending on whether the codes are simulation-oriented, write-intensive, data-analysis-oriented, read-intensive, or communication-intensive codes. In the context of artificial intelligence (AI) and machine learning (ML), these features support data-intensive applications that infer and predict statistical relationships in complex datasets. This article presents recent experiences at ORNL using Summit for applications in AI and ML and describes example code and algorithmic changes necessary to use Summit effectively. Finally, this article discusses research directions in scalable ML, including, algorithms research and combining data analysis with modeling and simulation in an accelerated-node, exascale environment.
UR - http://www.scopus.com/inward/record.url?scp=85074984627&partnerID=8YFLogxK
U2 - 10.1147/JRD.2019.2944146
DO - 10.1147/JRD.2019.2944146
M3 - Article
AN - SCOPUS:85074984627
SN - 0018-8646
VL - 63
JO - IBM Journal of Research and Development
JF - IBM Journal of Research and Development
IS - 6
M1 - 8851173
ER -