Abstract
As deep learning on edge computing systems has become more prevalent, investigation of architectures and configurations for optimal inference performance has become a critical step for proposed artificial intelligence solutions. While there has been considerable work in the development of hardware and software for high performance inferencing, there is little known about the performance of such systems on HPC architectures. In this paper, we address outstanding questions on the parallel inference performance on HPC systems. We report results and recommendations derived from evaluating iBench on multiple platforms in a variety of HPC configurations. We systematically benchmark single-GPU performance, single-node performance, and multi-node performance for maximum client-side and server-side inference throughput. In order to achieve linear speedup, we show that concurrent sending clients must be used, as opposed to sending large batch payloads parallelized across multiple GPUs. We show that client/server inferencing architectures add a considerable data transfer component that needs to be taken into consideration when benchmarking HPC system that benchmarks such as MLPerf do not measure. Finally, we investigate energy efficiency of GPUs for different levels of concurrency and batch sizes to report optimal configurations that minimize cost per inference.
Original language | English |
---|---|
Title of host publication | 2020 IEEE High Performance Extreme Computing Conference, HPEC 2020 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781728192192 |
DOIs | |
State | Published - Sep 22 2020 |
Externally published | Yes |
Event | 2020 IEEE High Performance Extreme Computing Conference, HPEC 2020 - Virtual, Waltham, United States Duration: Sep 21 2020 → Sep 25 2020 |
Publication series
Name | 2020 IEEE High Performance Extreme Computing Conference, HPEC 2020 |
---|
Conference
Conference | 2020 IEEE High Performance Extreme Computing Conference, HPEC 2020 |
---|---|
Country/Territory | United States |
City | Virtual, Waltham |
Period | 09/21/20 → 09/25/20 |
Funding
This material is based upon work supported by, or in part by, the Department of Defense High Performance Computing Modernization Program (HPCMP) under User Productivity, Enhanced Technology Transfer, and Training (PET) contracts #GS04T09DBC0017 and #47QFSA18K0111. Any opinions, finding and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the DoD HPCMP.
Keywords
- GPU
- MLPerf
- ResNet50
- benchmark
- distributed
- inference