TY - GEN
T1 - High-throughput computing on high-performance platforms
T2 - 13th IEEE International Conference on eScience, eScience 2017
AU - Oleynik, Danila
AU - Panitkin, Sergey
AU - Turilli, Matteo
AU - Angius, Alessio
AU - Oral, Sarp
AU - De, Kaushik
AU - Klimentov, Alexei
AU - Wells, Jack C.
AU - Jha, Shantenu
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/11/14
Y1 - 2017/11/14
N2 - The computing systems used by LHC experiments has historically consisted of the federation of hundreds to thousands of distributed resources, ranging from small to mid-size re-source. In spite of the impressive scale of the existing distributed computing solutions, the federation of small to mid-size resources will be insufficient to meet projected future demands. This paper is a case study of how the ATLAS experiment has embraced Titan - a DOE leadership facility in conjunction with traditional distributed high-throughput computing to reach sustained production scales of approximately 52M core-hours a years. The three main contributions of this paper are: (i) a critical evaluation of design and operational considerations to support the sustained, scalable and production usage of Titan; (ii) a preliminary characterization of a next generation executor for PanDA to support new workloads and advanced execution modes; and (iii) early lessons for how current and future experimental and observational systems can be integrated with production supercomputers and other platforms in a general and extensible manner.
AB - The computing systems used by LHC experiments has historically consisted of the federation of hundreds to thousands of distributed resources, ranging from small to mid-size re-source. In spite of the impressive scale of the existing distributed computing solutions, the federation of small to mid-size resources will be insufficient to meet projected future demands. This paper is a case study of how the ATLAS experiment has embraced Titan - a DOE leadership facility in conjunction with traditional distributed high-throughput computing to reach sustained production scales of approximately 52M core-hours a years. The three main contributions of this paper are: (i) a critical evaluation of design and operational considerations to support the sustained, scalable and production usage of Titan; (ii) a preliminary characterization of a next generation executor for PanDA to support new workloads and advanced execution modes; and (iii) early lessons for how current and future experimental and observational systems can be integrated with production supercomputers and other platforms in a general and extensible manner.
KW - high-performance and throughput computing
UR - http://www.scopus.com/inward/record.url?scp=85043785102&partnerID=8YFLogxK
U2 - 10.1109/eScience.2017.43
DO - 10.1109/eScience.2017.43
M3 - Conference contribution
AN - SCOPUS:85043785102
T3 - Proceedings - 13th IEEE International Conference on eScience, eScience 2017
SP - 295
EP - 304
BT - Proceedings - 13th IEEE International Conference on eScience, eScience 2017
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 24 October 2017 through 27 October 2017
ER -