Base64 encoding on heterogeneous computing platforms

Zheming Jin, Hal Finkel

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Base64 encoding has many applications on the Web. Previous studies investigated the optimizations of Base64 encoding algorithm on central processing units (CPUs). In this paper, we describe the optimizations of the algorithm on heterogeneous computing platforms. More specifically, we explain the algorithm, convert the algorithm to kernels written in CUDA C/C++ and Open Computing Language (OpenCL), optimize the CUDA and OpenCL applications with CUDA and OpenCL streams which can overlap data transfers with kernel computations, and vectorize the CUDA and OpenCL kernels to improve kernel throughput. We evaluate the impact of the number of streams upon the kernel performance on an NVIDIA Pascal P100 graphics processing unit (GPU) and a Nallatech 385A card that features an Intel Arria 10 GX1150 field-programmable gate array (FPGA). We also measure the performance and power of the applications on the CPU, GPU, and FPGA to know the advantage of each platform and the benefit of kernel offloading. The experiments show that using vector data types in the kernels is not for performance, and more work-items is better than large vectors per work-item on the GPU. OpenCL and CUDA streams can achieve almost the same performance on the GPU, but streams should be used with caution when GPU resources are underutilized. On the FPGA, kernel vectorization using 16 vector lanes can achieve the highest performance when the number of streams is one. However, increasing the vector width per work-item and the number of streams can decrease the kernel computation time for each stream, and thereby reduce the number of concurrent operations across the streams. While the raw performance on the GPU is 3.1X higher than that on the FPGA, the FPGA consumes 3.4X less power. A comparison with a state-of-the-art implementation on an Intel CPU server shows an increasing benefit of kernel offloading.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE 30th International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages247-254
Number of pages8
ISBN (Electronic)9781728116013
DOIs
StatePublished - Jul 2019
Externally publishedYes
Event30th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2019 - New York, United States
Duration: Jul 15 2019Jul 17 2019

Publication series

NameProceedings of the International Conference on Application-Specific Systems, Architectures and Processors
Volume2019-July
ISSN (Print)1063-6862

Conference

Conference30th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2019
Country/TerritoryUnited States
CityNew York
Period07/15/1907/17/19

Funding

We are grateful to the reviewers for their constructive criticism. The research was supported by the U.S. Department of Energy, Office of Science, under contract DE-AC02-06CH11357 and made use of the Argonne Leadership Computing Facility, a DOE Office of Science User Facility.

FundersFunder number
U.S. Department of Energy
Office of ScienceDE-AC02-06CH11357

    Keywords

    • Base64 encoding
    • CUDA
    • FPGA
    • GPU
    • Heterogeneous computing
    • OpenCL
    • Stream

    Fingerprint

    Dive into the research topics of 'Base64 encoding on heterogeneous computing platforms'. Together they form a unique fingerprint.

    Cite this