Abstract
The reconfigurable computing paradigm with field programmable gate arrays (FPGAs) has received renewed interest in the high-performance computing field due to FPGAs’ unique combination of performance and energy efficiency. However, difficulties in programming and optimizing FPGAs have prevented them from being widely accepted as general-purpose computing devices. In accelerator-based heterogeneous computing, portability across diverse heterogeneous devices is also an important issue, but the unique architectural features in FPGAs make this difficult to achieve. To address these issues, a directive-based, high-level FPGA programming and optimization framework was previously developed. In this work, developed optimizations were combined holistically using the directive-based approach to show that each individual benchmark requires a unique set of optimizations to maximize performance. We perform this exploration on Intel Arria 10 and Stratix 10 FPGAs. We also explored the relationships between performance, resource usages, and compilation times, and investigated implications for performance portability. Finally, we present an initial evaluation of a real-world proxy application, LULESH.
Original language | English |
---|---|
Article number | 102784 |
Journal | Parallel Computing |
Volume | 104-105 |
DOIs | |
State | Published - Jul 2021 |
Funding
This research was supported in part by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the US Department of Energy (DOE) Office of Science (OS) and the National Nuclear Security Administration, United States . The authors would like to acknowledge the ORNL Experimental Computing Laboratory (ExCL) team for its support with the compute resources and the software stack. This work was also partially supported by Center for Computational Sciences (CCS), University of Tsukuba, Japan. We thank CCS for access to Pre-PACS-X (PPX) cluster. This research was supported in part by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the US Department of Energy (DOE) Office of Science (OS) and the National Nuclear Security Administration, United States. This material is based upon work supported by the DOE OS Advanced Scientific Computing Research under contract number DE-AC05-00OR22725. The US government retains ? and the publisher, by accepting the article for publication, acknowledges that the US government retains ? a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This material is based upon work supported by the DOE OS Advanced Scientific Computing Research under contract number DE-AC05-00OR22725 . The US government retains – and the publisher, by accepting the article for publication, acknowledges that the US government retains – a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan ( http://energy.gov/downloads/doe-public-access-plan ). The authors would like to acknowledge the ORNL Experimental Computing Laboratory (ExCL) team for its support with the compute resources and the software stack. This work was also partially supported by Center for Computational Sciences (CCS), University of Tsukuba, Japan . We thank CCS for access to Pre-PACS-X (PPX) cluster.
Funders | Funder number |
---|---|
DOE Public Access Plan | |
ExCL | |
ORNL Experimental Computing Laboratory | |
U.S. Department of Energy | |
Office of Science | |
National Nuclear Security Administration | |
Advanced Scientific Computing Research | DE-AC05-00OR22725 |
Government of South Australia | |
University of Tsukuba | 17-SC-20-SC |
Keywords
- Compiler optimization
- Directive-based programming
- FPGA
- OpenACC
- OpenARC