Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics

Dalton Lunga, Jonathan Gerrand, Lexie Yang, Christopher Layton, Robert Stewart

Research output: Contribution to journalArticlepeer-review

41 Scopus citations

Abstract

The shear volumes of data generated from earth observation and remote sensing technologies continue to make major impact; leaping key geospatial applications into the dual data and compute-intensive era. As a consequence, this rapid advancement poses new computational and data processing challenges. We implement a novel remote sensing data flow (RESFlow) for advancing machine learning to compute with massive amounts of remotely sensed imagery. The core contribution is partitioning massive amounts of data into homogeneous distributions for fitting simple models. RESFlow takes advantage of Apache Spark and the availability of modern computing hardware to harness the acceleration of deep learning inference on expansive remote sensing imagery. The framework incorporates a strategy to optimize resource utilization across multiple executors assigned to a single worker. We showcase its deployment in both computationally and data-intensive workloads for pixel-level labeling tasks. The pipeline invokes deep learning inference at three stages; during deep feature extraction, deep metric mapping, and deep semantic segmentation. The tasks impose compute-intensive and GPU resource sharing challenges motivating for a parallelized pipeline for all execution steps. To address the problem of hardware resource contention, our containerized workflow further incorporates a novel GPU checkout routine and the ticketing system across multiple workers. The workflow is demonstrated with NVIDIA DGX accelerated platforms and offers appreciable compute speed-ups for deep learning inference on pixel labeling workloads; processing 21 028 TB of imagery data and delivering output maps at area rate of 5.245 sq.km/s, amounting to 453 168 sq.km/day-reducing a 28 day workload to 21 h.

Original languageEnglish
Article number8949817
Pages (from-to)271-283
Number of pages13
JournalIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Volume13
DOIs
StatePublished - 2020

Funding

Manuscript received June 20, 2019; revised November 9, 2019; accepted December 4, 2019. Date of publication January 2, 2020; date of current version February 12, 2020. This study was supported by the National Security Sciences Directorate, Oak Ridge National Laboratory. (Corresponding author: Dalton Lunga.) The authors are with the National Security Sciences Directorate, Oak Ridge National Laboratory, Oak Ridge, TN 37830 USA (e-mail: lungadd@ornl. gov; [email protected]; [email protected]; [email protected]; [email protected]). Digital Object Identifier 10.1109/JSTARS.2019.2959707 Additionally, we would like to acknowledge that this manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.

Keywords

  • Big data applications
  • high performance computing
  • image classification
  • inference mechanisms
  • machine learning
  • supervised learning

Fingerprint

Dive into the research topics of 'Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics'. Together they form a unique fingerprint.

Cite this