Lattice QCD with Domain Decomposition on Intel® Xeon Phi™ Co-Processors

Simon Heybrock, Balint Joó, Dhiraj D. Kalamkar, Mikhail Smelyanskiy, Karthikeyan Vaidyanathan, Tilo Wettig, Pradeep Dubey

Research output: Contribution to journalConference articlepeer-review

31 Scopus citations

Abstract

The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromo dynamics and implement such an alternative solver algorithm, based on domain decomposition, on Intel® Xeon Phi™ co-processor (KNC) clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the KNC. With a mix of single- and half-precision the domain-decomposition method sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation of a standard solver [1], our full multi-node domain-decomposition solver strong-scales to more nodes and reduces the time-to-solution by a factor of 5.

Original languageEnglish
Article number7012993
Pages (from-to)69-80
Number of pages12
JournalInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
Volume2015-January
Issue numberJanuary
DOIs
StatePublished - Jan 16 2014
Externally publishedYes
EventInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014 - New Orleans, United States
Duration: Nov 16 2014Nov 21 2014

Funding

FundersFunder number
Deutsche ForschungsgemeinschaftSFB/TR 55
Directorate for Computer and Information Science and Engineering1238993

    Keywords

    • Domain decomposition
    • Intel® Xeon Phi™ coprocessor
    • Lattice QCD

    Fingerprint

    Dive into the research topics of 'Lattice QCD with Domain Decomposition on Intel® Xeon Phi™ Co-Processors'. Together they form a unique fingerprint.

    Cite this