Beyond the CPU: Hardware performance counter monitoring on Blue Gene/Q

Heike McCraw, Dan Terpstra, Jack Dongarra, Kris Davis, Roy Musselman

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

The Blue Gene/Q (BG/Q) system is the third generation in the IBM Blue Gene line of massively parallel, energy efficient supercomputers that increases not only in size but also in complexity compared to its Blue Gene predecessors. Consequently, gaining insight into the intricate ways in which software and hardware are interacting requires richer and more capable performance analysis methods in order to be able to improve efficiency and scalability of applications that utilize this advanced system. The BG/Q predecessor, Blue Gene/P, suffered from incompletely implemented hardware performance monitoring tools. To address these limitations, an industry/academic collaboration was established early in BG/Q's development cycle to insure the delivery of effective performance tools at the machine's introduction. An extensive effort has been made to extend the Performance API (PAPI) to support hardware performance monitoring for the BG/Q platform. This paper provides detailed information about five recently added PAPI components that allow hardware performance counter monitoring of the 5D-Torus network, the I/O system and the Compute Node Kernel in addition to the processing cores on BG/Q. Furthermore, we explore the impact of node mappings on the performance of a parallel 3D-FFT kernel and use the new PAPI network component to collect hardware performance counter data on the 5D-Torus network. As a result, the network counters detected a large amount of redundant inter-node communications, which we were able to completely eliminate with the use of a customized node mapping.

Original languageEnglish
Title of host publicationSupercomputing - 28th International Supercomputing Conference, ISC 2013, Proceedings
Pages213-225
Number of pages13
DOIs
StatePublished - 2013
Externally publishedYes
Event28th International Supercomputing Conference on Supercomputing, ISC 2013 - Leipzig, Germany
Duration: Jun 16 2013Jun 20 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7905 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference28th International Supercomputing Conference on Supercomputing, ISC 2013
Country/TerritoryGermany
CityLeipzig
Period06/16/1306/20/13

Fingerprint

Dive into the research topics of 'Beyond the CPU: Hardware performance counter monitoring on Blue Gene/Q'. Together they form a unique fingerprint.

Cite this