Optimization with the OpenACC-to-FPGA framework on the Arria 10 and Stratix 10 FPGAs

Jacob Lambert, Seyong Lee, Jeffrey S. Vetter, Allen D. Malony

Research output: Contribution to journalArticlepeer-review

Abstract

The reconfigurable computing paradigm with field programmable gate arrays (FPGAs) has received renewed interest in the high-performance computing field due to FPGAs’ unique combination of performance and energy efficiency. However, difficulties in programming and optimizing FPGAs have prevented them from being widely accepted as general-purpose computing devices. In accelerator-based heterogeneous computing, portability across diverse heterogeneous devices is also an important issue, but the unique architectural features in FPGAs make this difficult to achieve. To address these issues, a directive-based, high-level FPGA programming and optimization framework was previously developed. In this work, developed optimizations were combined holistically using the directive-based approach to show that each individual benchmark requires a unique set of optimizations to maximize performance. We perform this exploration on Intel Arria 10 and Stratix 10 FPGAs. We also explored the relationships between performance, resource usages, and compilation times, and investigated implications for performance portability. Finally, we present an initial evaluation of a real-world proxy application, LULESH.

Original languageEnglish
Article number102784
JournalParallel Computing
Volume104-105
DOIs
StatePublished - Jul 2021

Funding

This research was supported in part by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the US Department of Energy (DOE) Office of Science (OS) and the National Nuclear Security Administration, United States . The authors would like to acknowledge the ORNL Experimental Computing Laboratory (ExCL) team for its support with the compute resources and the software stack. This work was also partially supported by Center for Computational Sciences (CCS), University of Tsukuba, Japan. We thank CCS for access to Pre-PACS-X (PPX) cluster. This research was supported in part by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the US Department of Energy (DOE) Office of Science (OS) and the National Nuclear Security Administration, United States. This material is based upon work supported by the DOE OS Advanced Scientific Computing Research under contract number DE-AC05-00OR22725. The US government retains ? and the publisher, by accepting the article for publication, acknowledges that the US government retains ? a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This material is based upon work supported by the DOE OS Advanced Scientific Computing Research under contract number DE-AC05-00OR22725 . The US government retains – and the publisher, by accepting the article for publication, acknowledges that the US government retains – a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan ( http://energy.gov/downloads/doe-public-access-plan ). The authors would like to acknowledge the ORNL Experimental Computing Laboratory (ExCL) team for its support with the compute resources and the software stack. This work was also partially supported by Center for Computational Sciences (CCS), University of Tsukuba, Japan . We thank CCS for access to Pre-PACS-X (PPX) cluster.

FundersFunder number
DOE Public Access Plan
ExCL
ORNL Experimental Computing Laboratory
U.S. Department of Energy
Office of Science
National Nuclear Security Administration
Advanced Scientific Computing ResearchDE-AC05-00OR22725
Government of South Australia
University of Tsukuba17-SC-20-SC

    Keywords

    • Compiler optimization
    • Directive-based programming
    • FPGA
    • OpenACC
    • OpenARC

    Fingerprint

    Dive into the research topics of 'Optimization with the OpenACC-to-FPGA framework on the Arria 10 and Stratix 10 FPGAs'. Together they form a unique fingerprint.

    Cite this