Preparation and optimization of a diverse workload for a large-scale heterogeneous system

Ian Karlin, Yoonho Park, Bronis R. De Supinski, Peng Wang, Bert Still, David Beckingsale, Robert Blake, Tong Chen, Guojing Cong, Carlos Costa, Johann Dahm, Giacomo Domeniconi, Thomas Epperly, Aaron Fisher, Sara Kokkila Schumacher, Steven Langer, Hai Le, Eun Kyung Lee, Naoya Maruyama, Xinyu QueDavid Richards, Bjorn Sjogreen, Jonathan Wong, Carol Woodward, Ulrike Yang, Xiaohua Zhang, Bob Anderson, David Appelhans, Levi Barnes, Peter Barnes, Sorin Bastea, David Boehme, Jamie A. Bramwell, Jim Brase, Jose Brunheroto, Barry Chen, Charway R. Cooper, Tony Degroot, Rob Falgout, Todd Gamblin, David Gardner, James Glosli, John Gunnels, Max Katz, Tzanio Kolev, I. Feng W. Kuo, Matthew P. Legendre, Ruipeng Li, Pei Hung Lin, Shelby Lockhart, Kathleen McCandless, Claudia Misale, Jaime Moreno, Rob Neely, Jarom Nelson, Rao Nimmakayala, Kathryn O'Brien, Kevin O'Brien, Ramesh Pankajakshan, Roger Pearce, Slaven Peles, Phil Regier, Steve Rennich, Martin Schulz, Howard Scott, James Sexton, Kathleen Shoga, Shiv Sundram, G. Thomas-Collignon, Brian Van Essen, Alexey Voronin, Bob Walkup, Lu Wang, Chris Ward, Hui Fang Wen, Dan White, Christopher Young, Cyril Zeller, Ed Zywicz

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Productivity from day one on supercomputers that leverage new technologies requires significant preparation. An institution that procures a novel system architecture often lacks sufficient institutional knowledge and skills to prepare for it. Thus, the "Center of Excellence" (CoE) concept has emerged to prepare for systems such as Summit and Sierra, currently the top two systems in the Top 500. This paper documents CoE experiences that prepared a workload of diverse applications and math libraries for a heterogeneous system. We describe our approach to this preparation, including our management and execution strategies, and detail our experiences with and reasons for using different programming approaches. Our early science and performance results show that the project enabled significant early seismic science with up to a l4X throughput increase over Cori. In addition to our successes, we discuss our challenges and failures so others may benefit from our experience.

Original languageEnglish
Title of host publicationProceedings of SC 2019
Subtitle of host publicationThe International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherIEEE Computer Society
ISBN (Electronic)9781450362290
DOIs
StatePublished - Nov 17 2019
Externally publishedYes
Event2019 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019 - Denver, United States
Duration: Nov 17 2019Nov 22 2019

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Conference

Conference2019 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019
Country/TerritoryUnited States
CityDenver
Period11/17/1911/22/19

Funding

Prepared by LLNL under Contract DE-AC52-07NA27344. LLNL-CONF-772139. IBM and NVIDIA participation was supported under CORAL NRE Contract B604142.

FundersFunder number
NREB604142
International Business Machines Corporation
NVIDIA

    Keywords

    • GPUs
    • Heterogeneous systems
    • Large-scale applications
    • Performance
    • Project management
    • programming models

    Fingerprint

    Dive into the research topics of 'Preparation and optimization of a diverse workload for a large-scale heterogeneous system'. Together they form a unique fingerprint.

    Cite this