On-demand data analytics in HPC environments at leadership computing facilities: Challenges and experiences

John Harney, Seung Hwan Lim, Sreenivas Sukumar, Dale Stansberry, Peter Xenopoulos

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

The construction of data analysis infrastructures that handle continuously accumulating data is quickly becoming an essential requirement for many organizations such as the U.S. Department of Energy (DOE). While DOE supports some of the largest computing facilities in the world, new analysis infrastructures like Apache Spark are difficult to implement. In this paper, we propose an on-demand Spark service that mitigates these difficulties, allowing facility users to flexibly create Spark instances quickly and easily. We define a systematic approach for creating these Spark instances and validate that optimal performance benefits are maintained. Using a series of benchmarks for algorithms that are commonly used in scientific workflows, we compared the behavior of Spark tasks using facility resources with that of an open research cloud that has a dedicated Spark infrastructure deployed. Finally, we leveraged a scientific use case from the Center of Nanophase Materials at the Oak Ridge National Laboratory to demonstrate the utility of using Spark in the computing facility.

Original languageEnglish
Title of host publicationProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
EditorsRonay Ak, George Karypis, Yinglong Xia, Xiaohua Tony Hu, Philip S. Yu, James Joshi, Lyle Ungar, Ling Liu, Aki-Hiro Sato, Toyotaro Suzumura, Sudarsan Rachuri, Rama Govindaraju, Weijia Xu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2087-2096
Number of pages10
ISBN (Electronic)9781467390040
DOIs
StatePublished - 2016
Event4th IEEE International Conference on Big Data, Big Data 2016 - Washington, United States
Duration: Dec 5 2016Dec 8 2016

Publication series

NameProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016

Conference

Conference4th IEEE International Conference on Big Data, Big Data 2016
Country/TerritoryUnited States
CityWashington
Period12/5/1612/8/16

Keywords

  • HPC
  • data analytics
  • distributed computing

Fingerprint

Dive into the research topics of 'On-demand data analytics in HPC environments at leadership computing facilities: Challenges and experiences'. Together they form a unique fingerprint.

Cite this