FatMan vs. LittleBoy: Scaling up linear algebraic operations in scale-out data platforms

Luna Xu, Seung Hwan Lim, Ali R. Butt, Sreenivas R. Sukumar, Ramakrishnan Kannan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Linear algebraic operations such as matrix manipulations form the kernel of many machine learning and other crucial algorithms. Scaling up as well as scaling out such algorithms are highly desirable to enable efficient processing over millions of data points. To this end, we present a matrix manipulation approach to effectively scale-up each node in a scale-out data parallel platform such as Apache Spark. Specifically, we enable hardware acceleration for matrix multiplications in a distributed Spark setup without user intervention. Our approach supports both dense and sparse distributed matrices, and provides flexible control of acceleration by matrix density. We demonstrate the benefit of our approach for generalized matrix multiplication operations over large matrices with up to four billion elements. To connect the effectiveness of our approach with machine learning applications, we performed Gramian matrix computation via generalized matrix multiplications. Our experiments show that our approach achieves more than 2× performance speed-up, and up to 96.1% computation improvement, compared to a state of the art Spark MLlib for dense matrices.

Original languageEnglish
Title of host publicationProceedings of PDSW-DISCS 2016
Subtitle of host publication1st Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems - Held in conjunction with SC16: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages25-30
Number of pages6
ISBN (Electronic)9781509052165
DOIs
StatePublished - Jan 30 2017
Event1st Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems, PDSW-DISCS 2016 - Salt Lake City, United States
Duration: Nov 14 2016 → …

Publication series

NameProceedings of PDSW-DISCS 2016: 1st Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems - Held in conjunction with SC16: The International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference1st Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems, PDSW-DISCS 2016
Country/TerritoryUnited States
CitySalt Lake City
Period11/14/16 → …

Fingerprint

Dive into the research topics of 'FatMan vs. LittleBoy: Scaling up linear algebraic operations in scale-out data platforms'. Together they form a unique fingerprint.

Cite this