HierKNEM: An adaptive framework for kernel-assisted and topology-aware collective communications on many-core clusters

Teng Ma, George Bosilca, Aurelien Bouteiller, Jack Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

24 Scopus citations

Abstract

Multicore Clusters, which have become the most prominent form of High Performance Computing (HPC) systems, challenge the performance of MPI applications with non uniform memory accesses and shared cache hierarchies. Recent advances in MPI collective communications have alleviated the performance issue exposed by deep memory hierarchies by carefully considering the mapping between the collective topology and the core distance, as well as the use of single-copy kernel assisted mechanisms. However, on distributed environments, a single level approach cannot encompass the extreme variations not only in bandwidth and latency capabilities, but also in the aptitude to support duplex communications or operate multiple concurrent copies simultaneously. This calls for a collaborative approach between multiple layers of collective algorithms, dedicating to extracting the maximum degree of parallelism from the collective algorithm by consolidating the intra- and inter-node communications. In this work, we present Hier KNEM a kernel-assisted topology-aware collective framework, and how this framework orchestrates the collaboration between multiple layers of collective algorithms. The resulting scheme enables perfect overlap of intra- and inter-node communications. We demonstrated experimentally, by considering three of the most used collective operations (Broadcast, All gather and Reduction), that 1) this approach is immune to modifications of the underlying process-core binding, 2) it outperforms state-of-art MPI libraries (Open MPI, MPICH2 and MVAPICH2) demonstrating up to a 30x speedup for synthetic benchmarks, and up to a 3x acceleration for a parallel graph application (ASP), 3) it furthermore demonstrates a linear speedup with the increase of the number of cores per node, a paramount requirement for scalability on future many-core hardware.

Original languageEnglish
Title of host publicationProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
Pages970-982
Number of pages13
DOIs
StatePublished - 2012
Event2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012 - Shanghai, China
Duration: May 21 2012May 25 2012

Publication series

NameProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012

Conference

Conference2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
Country/TerritoryChina
CityShanghai
Period05/21/1205/25/12

Keywords

  • HPC
  • MPI
  • cluster
  • collective communication
  • hierarchical
  • multicore

Fingerprint

Dive into the research topics of 'HierKNEM: An adaptive framework for kernel-assisted and topology-aware collective communications on many-core clusters'. Together they form a unique fingerprint.

Cite this