Enabling and exploiting flexible task assignment on GPU through SM-centric program transformations

Bo Wu, Guoyang Chen, Dong Li, Xipeng Shen, Jeffrey Vetter

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

76 Scopus citations

Abstract

A GPU's computing power lies in its abundant memory bandwidth and massive parallelism. However, its hardware thread schedulers, despite being able to quickly distribute computation to processors, often fail to capitalize on program characteristics effectively, achieving only a fraction of the GPU's full potential. Moreover, current GPUs do not allow programmers or compilers to control this thread scheduling, forfeiting important optimization opportunities at the program level. This paper presents a transformation centered on Streaming Multiprocessors (SM); this software approach to circumventing the limitations of the hardware scheduler allows exible program-level control of scheduling. By permitting precise control of job locality on SMs, the transformation overcomes inherent limitations in prior methods. With this technique, exible control of GPU scheduling at the program level becomes feasible, which opens up new opportunities for GPU program optimizations. The second part of the paper explores how the new opportunities could be leveraged for GPU performance enhancement, what complexities there are, and how to address them. We show that some simple optimization techniques can enhance co-runs of multiple kernels and improve data locality of irregular applications, producing 20-33% average increase in performance, system throughput, and average turnaround time.

Original languageEnglish
Title of host publicationICS 2015 - Proceedings of the 29th ACM International Conference on Supercomputing
PublisherAssociation for Computing Machinery
Pages119-130
Number of pages12
ISBN (Electronic)9781450335591
DOIs
StatePublished - Jun 8 2015
Event29th ACM International Conference on Supercomputing, ICS 2015 - Newport Beach, United States
Duration: Jun 8 2015Jun 11 2015

Publication series

NameProceedings of the International Conference on Supercomputing
Volume2015-June

Conference

Conference29th ACM International Conference on Supercomputing, ICS 2015
Country/TerritoryUnited States
CityNewport Beach
Period06/8/1506/11/15

Keywords

  • Compiler transformation
  • Data affinity
  • GPGPU
  • Program co-run
  • Scheduling

Fingerprint

Dive into the research topics of 'Enabling and exploiting flexible task assignment on GPU through SM-centric program transformations'. Together they form a unique fingerprint.

Cite this