Abstract
Efficient and scalable matrix operations are being highly demanding in the recent era of Machine Learning, Deep Learning, and Big Data Analytics. The two commonly used matrix-matrix operations in the Basic Linear Algebra Subprograms (BLAS) specification are General Matrix-Matrix multiplication (GEMM) and Symmetric Rank-k update (SYRK). The SYRK routine is a specialization of the GEMM routine, where half of the multiplications are skipped as the resultant matrix is known to be symmetric. Fortunately, several linear algebra libraries implement these BLAS routines quite efficiently. The libraries usually partition the input matrices into blocks and place them in processor caches, thus improving performance by leveraging the caches. However, the contemporary libraries are highly optimized for squarish matrices, but the performance degrades significantly for the matrices with edge case (strictly thin or strictly fat shapes) in the multicore machine. The primary reason is that the current state-of-the-art libraries make fixed block shapes based on a processor architecture, and do not consider the shape of the input matrices. In this paper, we propose a new blocking approach, we name it Flexible-blocking, to mitigate the scalability issues. In contrast to the contemporary libraries, our approach formulates the blocks of the input matrices based on the shapes of the matrices as well as the number of threads used in the implementation. Our proposed technique shows noticeable performance improvement on multicore shared-memory machines for the edge case matrices.
Original language | English |
---|---|
Title of host publication | Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018 |
Editors | Naoki Abe, Huan Liu, Calton Pu, Xiaohua Hu, Nesreen Ahmed, Mu Qiao, Yang Song, Donald Kossmann, Bing Liu, Kisung Lee, Jiliang Tang, Jingrui He, Jeffrey Saltz |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 3853-3862 |
Number of pages | 10 |
ISBN (Electronic) | 9781538650356 |
DOIs | |
State | Published - Jul 2 2018 |
Event | 2018 IEEE International Conference on Big Data, Big Data 2018 - Seattle, United States Duration: Dec 10 2018 → Dec 13 2018 |
Publication series
Name | Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018 |
---|
Conference
Conference | 2018 IEEE International Conference on Big Data, Big Data 2018 |
---|---|
Country/Territory | United States |
City | Seattle |
Period | 12/10/18 → 12/13/18 |
Funding
This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
Keywords
- BLAS
- Big Data
- Flexible-blocking
- Multicore
- Performance Tuning