DISP: Optimizations towards Scalable MPI Startup

Huansong Fu, Swaroop Pophale, Manjunath Gorentla Venkata, Weikuan Yu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Despite the popularity of MPI for high performance computing, the startup of MPI programs faces a scalability challenge as both the execution time and memory consumption increase drastically at scale. We have examined this problem using the collective modules of Cheetah and Tuned in Open MPI as representative implementations. Previous improvements for collectives have focused on algorithmic advances and hardware off-load. In this paper, we examine the startup cost of the collective module within a communicator and explore various techniques to improve its efficiency and scalability. Accordingly, we have developed a new scalable startup scheme with three internal techniques, namely Delayed Initialization, Module Sharing and Prediction-based Topology Setup (DISP). Our DISP scheme greatly benefits the collective initialization of the Cheetah module. At the same time, it helps boost the performance of non-collective initialization in the Tuned module. We evaluate the performance of our implementation on Titan supercomputer at ORNL with up to 4096 processes. The results show that our delayed initialization can speed up the startup of Tuned and Cheetah by an average of 32.0% and 29.2%, respectively, our module sharing can reduce the memory consumption of Tuned and Cheetah by up to 24.1% and 83.5%, respectively, and our prediction-based topology setup can speed up the startup of Cheetah by up to 80%.

Original languageEnglish
Title of host publicationProceedings of COM-HPC 2016
Subtitle of host publication1st Workshop on Optimization of Communication in HPC Runtime Systems - Held in conjunction with SC 2016: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages53-62
Number of pages10
ISBN (Electronic)9781509038299
DOIs
StatePublished - Jan 23 2017
Event1st Workshop on Optimization of Communication in HPC Runtime Systems, COM-HPC 2016 - Salt Lake City, United States
Duration: Nov 18 2016 → …

Publication series

NameProceedings of COM-HPC 2016: 1st Workshop on Optimization of Communication in HPC Runtime Systems - Held in conjunction with SC 2016: The International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference1st Workshop on Optimization of Communication in HPC Runtime Systems, COM-HPC 2016
Country/TerritoryUnited States
CitySalt Lake City
Period11/18/16 → …

Fingerprint

Dive into the research topics of 'DISP: Optimizations towards Scalable MPI Startup'. Together they form a unique fingerprint.

Cite this