Fast parallel connected components algorithms on GPUs

Guojing Cong, Paul Muzio

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

We study parallel connected components algorithms onGPUs in comparison with CPUs. Although straightforward implementation of PRAMalgorithms performs relatively better on GPUs than on CPUs, the GPU memory subsystem performance is poor due to non-coalesced random accesses.

We argue that generic sort-based access coalescing is too costly on GPUs.We propose a new coalescing technique and a new meta algorithm to improve locality and performance. Our optimization achieves up to 2.7 times speedup over the straightforward implementation. Interestingly, our optimization also works well on CPUs.

Comparing the best-performing algorithms on GPUs and CPUs, we find our new algorithm is the fastest on GPUs and the second fastest on CPUs, while the parallel Rem’s algorithm is the fastest on CPUs but does not perform well on GPUs due to path divergence.

Original languageEnglish
Title of host publicationEuro-Par 2014
Subtitle of host publicationParallel Processing Workshops - Euro-Par 2014 International Workshops, Revised Selected Papers
EditorsLuís Lopes
PublisherSpringer Verlag
Pages153-164
Number of pages12
EditionPart 1
ISBN (Electronic)9783319143248
DOIs
StatePublished - 2014
Externally publishedYes

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPart 1
Volume8805
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Keywords

  • CPU
  • Connected Components
  • GPU
  • Multi-core

Fingerprint

Dive into the research topics of 'Fast parallel connected components algorithms on GPUs'. Together they form a unique fingerprint.

Cite this