High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs

Natalie Beams, Ahmad Abdelfattah, Stan Tomov, Jack Dongarra, Tzanio Kolev, Yohann Dudouit

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

We present new GPU implementations of the tensor contractions arising from basis-related computations for high-order finite element methods. We consider both tensor and non-tensor bases. In the case of tensor bases, we introduce new kernels based on a series of fused device-level matrix multiplications (GEMMs), specifically designed to utilize the fast memory of the GPU. For non-tensor bases, we develop a tuned framework for choosing standard batch-BLAS GEMMs that will maximize performance across groups of elements. The implementations are included in a backend of the libCEED library. We present benchmark results for the diffusion and mass operators using libCEED integration through the MFEM finite element library and compare to those of the previously best-performing GPU backends for stand-alone basis computations. In tensor cases, we see improvements of approximately 10-30% for some cases, particularly for higher basis orders. For the non-tensor tests, the new batch-GEMMs implementation is twice as fast as what was previously available for basis function order greater than five and greater than approximately 105 degrees of freedom in the mesh; up to ten times speedup is seen for eighth-order basis functions.

Original languageEnglish
Title of host publicationProceedings of ScalA 2020
Subtitle of host publication11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages53-60
Number of pages8
ISBN (Electronic)9781665422703
DOIs
StatePublished - Nov 2020
Externally publishedYes
Event11th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA 2020 - Virtual, Atlanta, United States
Duration: Nov 13 2020Nov 13 2020

Publication series

NameProceedings of ScalA 2020: 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference11th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA 2020
Country/TerritoryUnited States
CityVirtual, Atlanta
Period11/13/2011/13/20

Funding

This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344, LLNL-CONF-815596. This research was supported by NVIDIA and the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations (the Office of Science and the National Nuclear Security Administration) responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering, and early testbed platforms, in support of the nation's exascale computing imperative.

FundersFunder number
DOE organizations
U.S. Department of Energy
Office of Science
National Nuclear Security Administration
Lawrence Livermore National LaboratoryLLNL-CONF-815596, DE-AC52-07NA27344
NVIDIA17-SC-20-SC

    Keywords

    • GPU
    • Tensor contractions
    • batched linear algebra
    • finite elements
    • high-order methods
    • matrix-free FEM

    Fingerprint

    Dive into the research topics of 'High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs'. Together they form a unique fingerprint.

    Cite this