TY - GEN
T1 - Collaborative reuse of streaming dataflows in IoT applications
AU - Chaturvedi, Shilpa
AU - Tyagi, Sahil
AU - Simmhan, Yogesh
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/11/14
Y1 - 2017/11/14
N2 - Distributed Stream Processing Systems (DSPS) like Apache Storm and Spark Streaming enable composition of continuous dataflows that execute persistently over data streams. They are used by Internet of Things (IoT) applications to analyze sensor data from Smart City cyber-infrastructure, and make active utility management decisions. As the ecosystem of such IoT applications that leverage shared urban sensor streams continue to grow, applications will perform duplicate pre-processing and analytics tasks. This offers the opportunity to collaboratively reuse the outputs of overlapping dataflows, thereby improving the resource efficiency. In this paper, we propose dataflow reuse algorithms that given a submitted dataflow, identifies the intersection of reusable tasks and streams from a collection of running dataflows to form a merged dataflow. Similar algorithms to unmerge dataflows when they are removed are also proposed. We implement these algorithms for the popular Apache Storm DSPS, and validate their performance and resource savings for 35 synthetic dataflows based on public OPMW workflows with diverse arrival and departure distributions, and on 21 real IoT dataflows from RIoTBench. We see that our Reuse algorithms reduce the count of running tasks by 38 46% for the two workloads, and a reduction in cumulative CPU usage of 36-51%, that can result in real cost savings on Cloud resources.
AB - Distributed Stream Processing Systems (DSPS) like Apache Storm and Spark Streaming enable composition of continuous dataflows that execute persistently over data streams. They are used by Internet of Things (IoT) applications to analyze sensor data from Smart City cyber-infrastructure, and make active utility management decisions. As the ecosystem of such IoT applications that leverage shared urban sensor streams continue to grow, applications will perform duplicate pre-processing and analytics tasks. This offers the opportunity to collaboratively reuse the outputs of overlapping dataflows, thereby improving the resource efficiency. In this paper, we propose dataflow reuse algorithms that given a submitted dataflow, identifies the intersection of reusable tasks and streams from a collection of running dataflows to form a merged dataflow. Similar algorithms to unmerge dataflows when they are removed are also proposed. We implement these algorithms for the popular Apache Storm DSPS, and validate their performance and resource savings for 35 synthetic dataflows based on public OPMW workflows with diverse arrival and departure distributions, and on 21 real IoT dataflows from RIoTBench. We see that our Reuse algorithms reduce the count of running tasks by 38 46% for the two workloads, and a reduction in cumulative CPU usage of 36-51%, that can result in real cost savings on Cloud resources.
UR - https://www.scopus.com/pages/publications/85043762768
U2 - 10.1109/eScience.2017.54
DO - 10.1109/eScience.2017.54
M3 - Conference contribution
AN - SCOPUS:85043762768
T3 - Proceedings - 13th IEEE International Conference on eScience, eScience 2017
SP - 403
EP - 412
BT - Proceedings - 13th IEEE International Conference on eScience, eScience 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 13th IEEE International Conference on eScience, eScience 2017
Y2 - 24 October 2017 through 27 October 2017
ER -