TY - GEN
T1 - A Heterogeneity-Aware Task Scheduler for Spark
AU - Xu, Luna
AU - Butt, Ali R.
AU - Lim, Seung Hwan
AU - Kannan, Ramakrishnan
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/10/29
Y1 - 2018/10/29
N2 - Big data processing systems such as Spark are employed in an increasing number of diverse applications - such as machine learning, graph computation, and scientific computing - each with dynamic and different resource needs. These applications increasingly run on heterogeneous hardware, e.g., with out-of-core accelerators. However, big data platforms do not factor in the multi-dimensional heterogeneity of applications and hardware. This leads to a fundamental mismatch between the application and hardware characteristics, and the resource scheduling adopted in big data platforms. For example, Hadoop and Spark consider only data locality when assigning tasks to nodes, and typically disregard the hardware capabilities and suitability to specific application requirements. In this paper, we present RUPAM, a heterogeneity-aware task scheduling system for big data platforms, which considers both task-level resource characteristics and underlying hardware characteristics, as well as preserves data locality. RUPAM adopts a simple yet effective heuristic to decide the dominant scheduling factor (e.g., CPU, memory, or I/O), given a task in a particular stage. Our experiments show that RUPAM is able to improve the performance of representative applications by up to 62.3% compared to the standard Spark scheduler.
AB - Big data processing systems such as Spark are employed in an increasing number of diverse applications - such as machine learning, graph computation, and scientific computing - each with dynamic and different resource needs. These applications increasingly run on heterogeneous hardware, e.g., with out-of-core accelerators. However, big data platforms do not factor in the multi-dimensional heterogeneity of applications and hardware. This leads to a fundamental mismatch between the application and hardware characteristics, and the resource scheduling adopted in big data platforms. For example, Hadoop and Spark consider only data locality when assigning tasks to nodes, and typically disregard the hardware capabilities and suitability to specific application requirements. In this paper, we present RUPAM, a heterogeneity-aware task scheduling system for big data platforms, which considers both task-level resource characteristics and underlying hardware characteristics, as well as preserves data locality. RUPAM adopts a simple yet effective heuristic to decide the dominant scheduling factor (e.g., CPU, memory, or I/O), given a task in a particular stage. Our experiments show that RUPAM is able to improve the performance of representative applications by up to 62.3% compared to the standard Spark scheduler.
KW - Big Data
KW - Heterogeneity
KW - Resource Management
KW - Scheduling
KW - Spark
UR - http://www.scopus.com/inward/record.url?scp=85057253283&partnerID=8YFLogxK
U2 - 10.1109/CLUSTER.2018.00042
DO - 10.1109/CLUSTER.2018.00042
M3 - Conference contribution
AN - SCOPUS:85057253283
T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC
SP - 245
EP - 256
BT - Proceedings - 2018 IEEE International Conference on Cluster Computing, CLUSTER 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE International Conference on Cluster Computing, CLUSTER 2018
Y2 - 10 September 2018 through 13 September 2018
ER -