Abstract
This paper describes an effort at the University of Tennessee's National Institute for Computational Sciences (NIC- S) to integrate Apache Spark into the widely used TORQUE HPC batch environment. The similarities and differences between the execution of a Spark program and that of an MPI program on a cluster are used to motivate how to implement Spark/TORQUE integration. An implementation of this integration, pbs-spark-submit, is described, including demonstrations of functionality on two HPC clusters and a large shared-memory system.
Original language | English |
---|---|
Title of host publication | Proceedings of the XSEDE 2015 Conference |
Subtitle of host publication | Scientific Advancements Enabled by Enhanced Cyberinfrastructure |
Publisher | Association for Computing Machinery |
ISBN (Electronic) | 9781450337205 |
DOIs | |
State | Published - Jul 26 2015 |
Externally published | Yes |
Event | 4th Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2015 - St. Louis, United States Duration: Jul 26 2015 → Jul 30 2015 |
Publication series
Name | ACM International Conference Proceeding Series |
---|---|
Volume | 2015-July |
Conference
Conference | 4th Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2015 |
---|---|
Country/Territory | United States |
City | St. Louis |
Period | 07/26/15 → 07/30/15 |
Bibliographical note
Publisher Copyright:Copyright © 2015 ACM.
Keywords
- Apache spark
- Batch processing
- Data analytics
- NICS
- PBS
- TORQUE