TY - GEN
T1 - A Flexible-blocking Based Approach for Performance Tuning of Matrix Multiplication Routines for Large Matrices with Edge Cases
AU - Hossain, Md Mosharaf
AU - Hines, Thomas M.
AU - Ghafoor, Sheikh K.
AU - Rabiul Islam, Sheikh
AU - Kannan, Ramakrishnan
AU - Sukumar, Sreenivas R.
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - Efficient and scalable matrix operations are being highly demanding in the recent era of Machine Learning, Deep Learning, and Big Data Analytics. The two commonly used matrix-matrix operations in the Basic Linear Algebra Subprograms (BLAS) specification are General Matrix-Matrix multiplication (GEMM) and Symmetric Rank-k update (SYRK). The SYRK routine is a specialization of the GEMM routine, where half of the multiplications are skipped as the resultant matrix is known to be symmetric. Fortunately, several linear algebra libraries implement these BLAS routines quite efficiently. The libraries usually partition the input matrices into blocks and place them in processor caches, thus improving performance by leveraging the caches. However, the contemporary libraries are highly optimized for squarish matrices, but the performance degrades significantly for the matrices with edge case (strictly thin or strictly fat shapes) in the multicore machine. The primary reason is that the current state-of-the-art libraries make fixed block shapes based on a processor architecture, and do not consider the shape of the input matrices. In this paper, we propose a new blocking approach, we name it Flexible-blocking, to mitigate the scalability issues. In contrast to the contemporary libraries, our approach formulates the blocks of the input matrices based on the shapes of the matrices as well as the number of threads used in the implementation. Our proposed technique shows noticeable performance improvement on multicore shared-memory machines for the edge case matrices.
AB - Efficient and scalable matrix operations are being highly demanding in the recent era of Machine Learning, Deep Learning, and Big Data Analytics. The two commonly used matrix-matrix operations in the Basic Linear Algebra Subprograms (BLAS) specification are General Matrix-Matrix multiplication (GEMM) and Symmetric Rank-k update (SYRK). The SYRK routine is a specialization of the GEMM routine, where half of the multiplications are skipped as the resultant matrix is known to be symmetric. Fortunately, several linear algebra libraries implement these BLAS routines quite efficiently. The libraries usually partition the input matrices into blocks and place them in processor caches, thus improving performance by leveraging the caches. However, the contemporary libraries are highly optimized for squarish matrices, but the performance degrades significantly for the matrices with edge case (strictly thin or strictly fat shapes) in the multicore machine. The primary reason is that the current state-of-the-art libraries make fixed block shapes based on a processor architecture, and do not consider the shape of the input matrices. In this paper, we propose a new blocking approach, we name it Flexible-blocking, to mitigate the scalability issues. In contrast to the contemporary libraries, our approach formulates the blocks of the input matrices based on the shapes of the matrices as well as the number of threads used in the implementation. Our proposed technique shows noticeable performance improvement on multicore shared-memory machines for the edge case matrices.
KW - BLAS
KW - Big Data
KW - Flexible-blocking
KW - Multicore
KW - Performance Tuning
UR - http://www.scopus.com/inward/record.url?scp=85062602951&partnerID=8YFLogxK
U2 - 10.1109/BigData.2018.8622013
DO - 10.1109/BigData.2018.8622013
M3 - Conference contribution
AN - SCOPUS:85062602951
T3 - Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
SP - 3853
EP - 3862
BT - Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
A2 - Abe, Naoki
A2 - Liu, Huan
A2 - Pu, Calton
A2 - Hu, Xiaohua
A2 - Ahmed, Nesreen
A2 - Qiao, Mu
A2 - Song, Yang
A2 - Kossmann, Donald
A2 - Liu, Bing
A2 - Lee, Kisung
A2 - Tang, Jiliang
A2 - He, Jingrui
A2 - Saltz, Jeffrey
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE International Conference on Big Data, Big Data 2018
Y2 - 10 December 2018 through 13 December 2018
ER -