TY - GEN
T1 - FatMan vs. LittleBoy
T2 - 1st Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems, PDSW-DISCS 2016
AU - Xu, Luna
AU - Lim, Seung Hwan
AU - Butt, Ali R.
AU - Sukumar, Sreenivas R.
AU - Kannan, Ramakrishnan
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2017/1/30
Y1 - 2017/1/30
N2 - Linear algebraic operations such as matrix manipulations form the kernel of many machine learning and other crucial algorithms. Scaling up as well as scaling out such algorithms are highly desirable to enable efficient processing over millions of data points. To this end, we present a matrix manipulation approach to effectively scale-up each node in a scale-out data parallel platform such as Apache Spark. Specifically, we enable hardware acceleration for matrix multiplications in a distributed Spark setup without user intervention. Our approach supports both dense and sparse distributed matrices, and provides flexible control of acceleration by matrix density. We demonstrate the benefit of our approach for generalized matrix multiplication operations over large matrices with up to four billion elements. To connect the effectiveness of our approach with machine learning applications, we performed Gramian matrix computation via generalized matrix multiplications. Our experiments show that our approach achieves more than 2× performance speed-up, and up to 96.1% computation improvement, compared to a state of the art Spark MLlib for dense matrices.
AB - Linear algebraic operations such as matrix manipulations form the kernel of many machine learning and other crucial algorithms. Scaling up as well as scaling out such algorithms are highly desirable to enable efficient processing over millions of data points. To this end, we present a matrix manipulation approach to effectively scale-up each node in a scale-out data parallel platform such as Apache Spark. Specifically, we enable hardware acceleration for matrix multiplications in a distributed Spark setup without user intervention. Our approach supports both dense and sparse distributed matrices, and provides flexible control of acceleration by matrix density. We demonstrate the benefit of our approach for generalized matrix multiplication operations over large matrices with up to four billion elements. To connect the effectiveness of our approach with machine learning applications, we performed Gramian matrix computation via generalized matrix multiplications. Our experiments show that our approach achieves more than 2× performance speed-up, and up to 96.1% computation improvement, compared to a state of the art Spark MLlib for dense matrices.
UR - http://www.scopus.com/inward/record.url?scp=85015283919&partnerID=8YFLogxK
U2 - 10.1109/PDSW-DISCS.2016.009
DO - 10.1109/PDSW-DISCS.2016.009
M3 - Conference contribution
AN - SCOPUS:85015283919
T3 - Proceedings of PDSW-DISCS 2016: 1st Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems - Held in conjunction with SC16: The International Conference for High Performance Computing, Networking, Storage and Analysis
SP - 25
EP - 30
BT - Proceedings of PDSW-DISCS 2016
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 14 November 2016
ER -