Abstract
We present training and inference benchmarks on a POWER9 edge supercomputer called SCOUT using two production-level artificial intelligence systems: The Video Processing Exploitation Framework (VPEF) and the Ordnance Threat Target Automated Recognition (OTTAR) system. For training benchmarks, we use Horovod to train ResNet-50 on synthetic data for a baseline benchmark, and then train Faster R-CNN on an available WASABI dataset using SCOUT's six-way V100 GPU nodes. For inference benchmarks, we use GStreamer to stream both synthetic and real Motion Imagery (MI) and then use Single-Shot Detector trained on MobileNetV2 data with real and synthetic MI at four different resolutions (720p, 1080p, 1280p, and 2160p) while measuring average inference time. We also test reduced-precision inference performance using TensorRT and distributed inference performance using the Inference Benchmark "iBench"framework. We compare our results with equivalent work on x86_64 systems and provide suggestions on tuning for optimal performance. We find that V100s work well for training and offline inferencing in batches, while T4 GPUs outperform V100 GPUs only in very specific usage scenarios, such as streaming detection at reduced precision.
| Original language | English |
|---|---|
| Title of host publication | 2021 IEEE High Performance Extreme Computing Conference, HPEC 2021 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| ISBN (Electronic) | 9781665423694 |
| DOIs | |
| State | Published - 2021 |
| Externally published | Yes |
| Event | 2021 IEEE High Performance Extreme Computing Conference, HPEC 2021 - Virtual, Online, United States Duration: Sep 20 2021 → Sep 24 2021 |
Publication series
| Name | 2021 IEEE High Performance Extreme Computing Conference, HPEC 2021 |
|---|
Conference
| Conference | 2021 IEEE High Performance Extreme Computing Conference, HPEC 2021 |
|---|---|
| Country/Territory | United States |
| City | Virtual, Online |
| Period | 09/20/21 → 09/24/21 |
Funding
DISTRIBUTION STATEMENT A. Approved for public release. This material is based upon work supported by, or in part by, the Department of Defense High Performance Computing Modernization Program (HPCMP) under User Productivity, Enhanced Technology Transfer, and Training (PET) contracts #GS04T09DBC0017 and #47QFSA18K0111. Any opinions, finding and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the DoD HPCMP.
Keywords
- GPU
- MI
- Motion Imagery
- TensorRT
- artificial intelligence
- benchmark
- classification
- distributed
- inference
- object detection
- power9
- training