Fine-grained exploitation of mixed precision for faster CNN training

Jeremy T. Johnston, Steven R. Young, Catherine D. Schuman, Junghoon Chae, Don D. March, Robert M. Patton, Thomas E. Potok

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

As deep convolutional neural networks (CNNs) have become increasingly popular and successful at an ever-widening number of machine learning tasks specialized hardware has become increasingly available for training and deploying them. NVIDIA's recent Volta architecture includes tensor cores which perform a fused operation reduced and mixed precision (16-bit multiply, 32-bit accumulate). Recent research indicates that, typically, very little is lost (in terms of training accuracy) when half precision is used in place of single precision, and performance gains can be made by doing arithmetic in reduced precision. In this work we demonstrate that making layer-by-layer choices as to the arithmetic/data precision can lead to further performance improvement. In our study of 25,200 CNNs we demonstrate an average speedup (over purely half precision) of 1.27x and speedups as high as 3.64x by appropriately combining single and half precision arithmetic and data types on a layer-by-layer basis.

Original languageEnglish
Title of host publicationProceedings of MLHPC 2019
Subtitle of host publication5th Workshop on Machine Learning in HPC Environments - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages9-18
Number of pages10
ISBN (Electronic)9781728159850
DOIs
StatePublished - Nov 2019
Event5th IEEE/ACM Workshop on Machine Learning in HPC Environments, MLHPC 2019 - Denver, United States
Duration: Nov 18 2019 → …

Publication series

NameProceedings of MLHPC 2019: 5th Workshop on Machine Learning in HPC Environments - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference5th IEEE/ACM Workshop on Machine Learning in HPC Environments, MLHPC 2019
Country/TerritoryUnited States
CityDenver
Period11/18/19 → …

Funding

This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). V. ACKNOWLEDGEMENTS This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Robinson Pino, program manager, under contract number DE-AC05-00OR22725.

FundersFunder number
U.S. Department of EnergyDE-AC05-00OR22725
Office of Science
Advanced Scientific Computing Research

    Fingerprint

    Dive into the research topics of 'Fine-grained exploitation of mixed precision for faster CNN training'. Together they form a unique fingerprint.

    Cite this