Optimizing a multiple right-hand side Dslash kernel for Intel knights corner

  • Aaron Walden
  • , Sabbir Khan
  • , Bálint Joó
  • , Desh Ranjan
  • , Mohammad Zubair

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

There is a significant interest in the computational physics community to perform lattice quantum chromodynamics (LQCD) simulations, which can run into the trillions of operations. LQCD computations solve a sparse linear system using a Wilson Dslash kernel, which has an arithmetic intensity of 0.88–2.29. This makes Dslash memory bandwidth-bound on most architectures, including Intel Xeon Phi Knights Corner (KNC). Most research optimizing the Dslash operator has been focused on single right-hand side (SRHS) linear solvers. There is a class of LQCD computations which aims to solve systems with multiple right-hand sides (MRHS), presenting additional opportunities for data reuse and vectorization. We present two approaches to MRHS Dslash: a vector register blocking approach and one using the software package QPhiX with a custom code generator for low-level intrinsics.We observed significant speedups using our approaches, with sustained performance of over 700 GFLOPS (single precision) in one instance. We achieved up to 29% of theoretical peak performance compared to a maximum of 13% obtained by the previous SRHS method using QPhiX.

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9945 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceInternational Workshops on High Performance Computing, ISC High Performance 2016 and Workshop on 2nd International Workshop on Communication Architectures at Extreme Scale, ExaComm 2016, Workshop on Exascale Multi/Many Core Computing Systems, E-MuCoCoS 2016, HPC I/O in the Data Center, HPC-IODC 2016, Application Performance on Intel Xeon Phi – Being Prepared for KNL and Beyond, IXPUG 2016, International Workshop on OpenPOWER for HPC, IWOPH 2016, International Workshop on Performance Portable Programming Models for Accelerators, P^3MA 2016, Workshop on Virtualization in High-Performance Cloud Computing, VHPC 2016, Workshop on Performance and Scalability of Storage Systems, WOPSSS 2016
Country/TerritoryGermany
CityFrankfurt
Period06/19/1606/23/16

Funding

Notice: Authored by Jefferson Science Associates, LLC under U.S. DOE Contract No. DE-AC05-06OR23177. The U.S. Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce this manuscript for U.S. Government purposes. This work was partially supported by a grant from Jefferson Lab. Aaron Walden and Sabbir Khan were also partially supported by the Old Dominion University Modeling and Simulation Fellowship Program and gratefully acknowledge this support. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Nuclear Physics under contract DE-AC05-06OR23177.

Keywords

  • Code generator
  • LQCD
  • Optimization
  • Parallel programming
  • Performance
  • Vectorization
  • Wilson-Dslash
  • Xeon Phi Knights Corner

Fingerprint

Dive into the research topics of 'Optimizing a multiple right-hand side Dslash kernel for Intel knights corner'. Together they form a unique fingerprint.

Cite this