A Multi-faceted Approach to Job Placement for Improved Performance on Extreme-Scale Systems

Christopher Zimmer, Saurabh Gupta, Scott Atchley, Sudharshan S. Vazhkudai, Carl Albing

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

Job placement plays a pivotal role in application performance on supercomputers. We present a multi-faceted exploration to influence placement in extreme-scale systems, to improve network performance and decrease variability. In our first exploration, Scores, we developed a machine learning model that extracts features from a job's node-allocation and grades performance. This identified several important node-metrics that led to Dual-Ended scheduling, a means of reducing network contention without impacting utilization. In evaluations on the Titan supercomputer, we observed reductions in average hop-count by up to 50%. We also developed an improved node-layout strategy that targets a better balance between network latency and bandwidth, replacing the default ALPS layout on Titan that resulted in an average of 10% runtime improvement. Both of these efforts underscore the importance of a job placement strategy that is cognizant of workload mixture and network topology.

Original languageEnglish
Title of host publicationProceedings of SC 2016
Subtitle of host publicationThe International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherIEEE Computer Society
Pages1015-1025
Number of pages11
ISBN (Electronic)9781467388153
DOIs
StatePublished - Jul 2 2016
Event2016 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016 - Salt Lake City, United States
Duration: Nov 13 2016Nov 18 2016

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
Volume0
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Conference

Conference2016 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016
Country/TerritoryUnited States
CitySalt Lake City
Period11/13/1611/18/16

Fingerprint

Dive into the research topics of 'A Multi-faceted Approach to Job Placement for Improved Performance on Extreme-Scale Systems'. Together they form a unique fingerprint.

Cite this