A Heterogeneity-Aware Task Scheduler for Spark

Luna Xu, Ali R. Butt, Seung Hwan Lim, Ramakrishnan Kannan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Scopus citations

Abstract

Big data processing systems such as Spark are employed in an increasing number of diverse applications - such as machine learning, graph computation, and scientific computing - each with dynamic and different resource needs. These applications increasingly run on heterogeneous hardware, e.g., with out-of-core accelerators. However, big data platforms do not factor in the multi-dimensional heterogeneity of applications and hardware. This leads to a fundamental mismatch between the application and hardware characteristics, and the resource scheduling adopted in big data platforms. For example, Hadoop and Spark consider only data locality when assigning tasks to nodes, and typically disregard the hardware capabilities and suitability to specific application requirements. In this paper, we present RUPAM, a heterogeneity-aware task scheduling system for big data platforms, which considers both task-level resource characteristics and underlying hardware characteristics, as well as preserves data locality. RUPAM adopts a simple yet effective heuristic to decide the dominant scheduling factor (e.g., CPU, memory, or I/O), given a task in a particular stage. Our experiments show that RUPAM is able to improve the performance of representative applications by up to 62.3% compared to the standard Spark scheduler.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE International Conference on Cluster Computing, CLUSTER 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages245-256
Number of pages12
ISBN (Electronic)9781538683194
DOIs
StatePublished - Oct 29 2018
Event2018 IEEE International Conference on Cluster Computing, CLUSTER 2018 - Belfast, United Kingdom
Duration: Sep 10 2018Sep 13 2018

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
Volume2018-September
ISSN (Print)1552-5244

Conference

Conference2018 IEEE International Conference on Cluster Computing, CLUSTER 2018
Country/TerritoryUnited Kingdom
CityBelfast
Period09/10/1809/13/18

Funding

This work is sponsored in part by the NSF under the grants: CNS-1405697, CNS-1422788, and CNS-1615411. This research also used resources of the OLCF at the Oak Ridge National Laboratory and this manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

FundersFunder number
National Science FoundationCNS-1615411, CNS-1422788, CNS-1405697
Oak Ridge National Laboratory

    Keywords

    • Big Data
    • Heterogeneity
    • Resource Management
    • Scheduling
    • Spark

    Fingerprint

    Dive into the research topics of 'A Heterogeneity-Aware Task Scheduler for Spark'. Together they form a unique fingerprint.

    Cite this