Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements

Daniel Barry, Heike Jagode, Anthony Danalis, Jack Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Some of the most important categories of performance events count the data traffic between the processing cores and the main memory. However, since these counters are not core-private, applications require elevated privileges to access them. PAPI offers a component that can access this information on IBM systems through the Performance Co-Pilot (PCP); however, doing so adds an indirection layer that involves querying the PCP daemon. This paper performs a quantitative study of the accuracy of the measurements obtained through this component on the Summit supercomputer. We use two linear algebra kernels - a generalized matrix multiply, and a modified matrix-vector multiply - as benchmarks and a distributed, GPU-accelerated 3D-FFT mini-app (using cuFFT) to compare the measurements obtained through the PAPI PCP component against the expected values across different problem sizes. We also compare our measurements against an in-house machine with a very similar architecture to Summit, where elevated privileges allow PAPI to access the hardware counters directly (without using PCP) to show that measurements taken via PCP are as accurate as the those taken directly. Finally, using both QMCPACK and the 3D-FFT, we demonstrate the diverse hardware activities that can be monitored simultaneously via PAPI hardware components.

Original languageEnglish
Title of host publication2023 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages393-402
Number of pages10
ISBN (Electronic)9798350311990
DOIs
StatePublished - 2023
Externally publishedYes
Event2023 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2023 - St. Petersburg, United States
Duration: May 15 2023May 19 2023

Publication series

Name2023 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2023

Conference

Conference2023 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2023
Country/TerritoryUnited States
CitySt. Petersburg
Period05/15/2305/19/23

Funding

ACKNOWLEDGMENT We thank the anonymous reviewers for their improvement suggestions. This research was supported in part by the Exas-cale Computing Project (17-SC-20-SC), a collaboratvi e effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration; and by the National Science Foundation under award No. 1900888 “ANACIN-X.”

Keywords

  • GPU power
  • PAPI
  • high performance computing
  • memory bandwidth
  • network traffic
  • performance analysis
  • performance counters

Fingerprint

Dive into the research topics of 'Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements'. Together they form a unique fingerprint.

Cite this