Lessons Learned from Optimizing Kernels for Adaptive Aggregation Multi-grid Solvers in Lattice QCD

Bálint Joó, Thorsten Kurth

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In recent years, adaptive aggregation multi-grid (AAMG) methods have become the gold standard for solving the Dirac equation in Lattice QCD (LQCD) using Wilson-Clover fermions. These methods are able to overcome the critical slowing down as quark masses approach their physical values and are thus the go-to method for performing Lattice QCD calculations at realistic physical parameters. In this paper we discuss the optimization of a specific building block for implementing AAMG for Wilson-Clover fermions from LQCD, known as the coarse restrictor operator, on contemporary Intel processors featuring large SIMD widths and high thread counts. We will discuss in detail the efficient use of OpenMP and Intel vector intrinsics in our attempts to exploit fine grained parallelism on the coarsest levels. We present performance optimizations and discuss the ramifications for implementing a full AAMG stack on Intel Xeon Phi Knights Landing and Skylake processors.

Original languageEnglish
Title of host publicationHigh Performance Computing - ISC High Performance 2018 International Workshops, Revised Selected Papers
EditorsRio Yokota, Michèle Weiland, Sadaf Alam, John Shalf
PublisherSpringer Verlag
Pages472-486
Number of pages15
ISBN (Print)9783030024642
DOIs
StatePublished - 2018
Externally publishedYes
EventInternational Conference on High Performance Computing, ISC High Performance 2018 - Frankfurt, Germany
Duration: Jun 28 2018Jun 28 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11203 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceInternational Conference on High Performance Computing, ISC High Performance 2018
Country/TerritoryGermany
CityFrankfurt
Period06/28/1806/28/18

Funding

Acknowledgment. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231, and of the ALCF, which is supported by DOE/SC under contract DE-AC02-06CH11357. B. Joo acknowledges funding from the DOE Office Of Science, Offices of Nuclear Physics and Advanced Scientific Computing Research through the SciDAC program. B. Joo also acknowledges support from the U.S. DOE Exascale Computing Project (ECP). This work is supported by the U.S. Department of Energy, Office of Science, Office of Nuclear Physics under contract DE-AC05-06OR23177. B. Joo would like to thank and acknowledge Kate Clark of NVIDIA for many discussions about expressing and mapping parallelism in multi-grid solver components in a variety of programming models and hardware and her helpful comments after a reading of this manuscript, as well as Christian Trott of Sandia Labs for discussions about nested par-alleism in OpenMP. This work used resources provided by the Performance Research Laboratory at the University of Oregon. We would especially like to thank Sameer Shende and Rob Yelle for their professional support of the Performance Research Laboratory computers and their timely response to our requests.

FundersFunder number
DOE Office of Science
DOE/SCDE-AC02-06CH11357
Office of Nuclear PhysicsDE-AC05-06OR23177
U.S. DOE
U.S. Department of EnergyDE-AC02-05CH11231
Office of Science
Advanced Scientific Computing Research

    Fingerprint

    Dive into the research topics of 'Lessons Learned from Optimizing Kernels for Adaptive Aggregation Multi-grid Solvers in Lattice QCD'. Together they form a unique fingerprint.

    Cite this