AutoGEMM: Pushing the Limits of Irregular Matrix Multiplication on Arm Architectures

  • Du Wu
  • , Jintao Meng
  • , Wenxi Zhu
  • , Minwen Deng
  • , Xiao Wang
  • , Tao Luo
  • , Mohamed Wahib
  • , Yanjie Wei

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

This paper presents an open-source library that pushes the limits of performance portability for irregular General Matrix Multiplication (GEMM) on the widely-used Arm architectures. Our library, autoGEMM, is designed to support a wide range of Arm processors: from edge devices to HPCgrade CPUs. autoGEMM generates optimized kernels for various hardware configurations by auto-combining fragments of autogenerated micro-kernels that employ hand-written optimizations to maximize computational efficiency. We optimize the kernel pipeline by tuning the register reuse and the data load/store overlapping. In addition, we use a dynamic tiling scheme to generate balanced tile shapes. Finally, we position autoGEMM on top of the TVM framework where our dynamic tiling scheme prunes the search space for TVM to identify the optimal combination of parameters for code optimization. Evaluations on five different classes of Arm chips demonstrate the advantages of autoGEMM. For small matrices, autoGEMM achieves 98% of peak and up to 2.0x speedup over state-of-the-art libraries such as LIBXSMM and LibShalom. For irregular matrices (i.e. tall skinny and long rectangles), autoGEMM is 1.3-2.0x faster than widely-used libraries such as OpenBLAS and Eigen. autoGEMM is publicly available at: https://github.com/wudu98/autoGEMM.

Original languageEnglish
Title of host publicationProceedings of SC 2024
Subtitle of host publicationInternational Conference for High Performance Computing, Networking, Storage and Analysis
PublisherIEEE Computer Society
ISBN (Electronic)9798350352917
DOIs
StatePublished - 2024
Event2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024 - Atlanta, United States
Duration: Nov 17 2024Nov 22 2024

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Conference

Conference2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024
Country/TerritoryUnited States
CityAtlanta
Period11/17/2411/22/24

Funding

This work was partly supported by the Key Research and Development Project of Guangdong Province under grant No. 2021B0101310002, Shenzhen-HongKong Joint Funding Project (A) under Grant No. SGDX20230116092056010, the National Key Research and Development Program of China under Grant No. 2021YFF0901102, 2021YFF1200100 and 2021YFF1200104, Shenzhen Key Laboratory of Intelligent Bioinformatics under Grant No. ZDSYS20220422103800001. This work was also supported by Shanghai Zelixir Biotech Company by its Joint Lab of Zelixir-SIAT, and Tencent with 2 years continuous funding support.

Fingerprint

Dive into the research topics of 'AutoGEMM: Pushing the Limits of Irregular Matrix Multiplication on Arm Architectures'. Together they form a unique fingerprint.

Cite this