Integrating apache spark Into PBS-Based HPC environments

Troy Baer, Paul Peltz, Junqi Yin, Edmon Begoli

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

This paper describes an effort at the University of Tennessee's National Institute for Computational Sciences (NIC- S) to integrate Apache Spark into the widely used TORQUE HPC batch environment. The similarities and differences between the execution of a Spark program and that of an MPI program on a cluster are used to motivate how to implement Spark/TORQUE integration. An implementation of this integration, pbs-spark-submit, is described, including demonstrations of functionality on two HPC clusters and a large shared-memory system.

Original languageEnglish
Title of host publicationProceedings of the XSEDE 2015 Conference
Subtitle of host publicationScientific Advancements Enabled by Enhanced Cyberinfrastructure
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450337205
DOIs
StatePublished - Jul 26 2015
Externally publishedYes
Event4th Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2015 - St. Louis, United States
Duration: Jul 26 2015Jul 30 2015

Publication series

NameACM International Conference Proceeding Series
Volume2015-July

Conference

Conference4th Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2015
Country/TerritoryUnited States
CitySt. Louis
Period07/26/1507/30/15

Bibliographical note

Publisher Copyright:
Copyright © 2015 ACM.

Keywords

  • Apache spark
  • Batch processing
  • Data analytics
  • NICS
  • PBS
  • TORQUE

Fingerprint

Dive into the research topics of 'Integrating apache spark Into PBS-Based HPC environments'. Together they form a unique fingerprint.

Cite this