Skip to main navigation Skip to search Skip to main content

High-Performance Mixed-Precision Matrix Multiplication via Tile-Centric Design on Modern Architectures

  • Qiao Zhang
  • , Rabab Alomairy
  • , Dali Wang
  • , Zhuowei Gu
  • , Qinglei Cao

Research output: Contribution to journalArticlepeer-review

Abstract

General Matrix Multiplication (GEMM) is a critical operation underpinning a wide range of applications in high-performance computing (HPC) and artificial intelligence (AI). The emergence of hardware optimized for low-precision arithmetic necessitates a reevaluation of numerical algorithms to leverage mixed-precision computations, achieving improved performance and energy efficiency. This research presents an adaptive mixed-precision GEMM framework that enables support for various precision formats at fine-grained tile and block levels, offering a reliable foundation for trustworthy mixed-precision computations. Furthermore, we leverage the PaRSEC runtime system to effectively balance workloads across diverse architectures. The performance exhibits strong scalability across both homogeneous platforms (Intel CPU-based systems and the ARM CPU-based Fugaku supercomputer) and heterogeneous systems (Nvidia V100, A100, and H100 GPU-based platforms, as well as the AMD GPU-based Frontier supercomputer). This work aims to improve computational efficiency and accuracy by bridging algorithmic innovations with hardware capabilities, fostering transformative advancements across a wide range of applications.

Original languageEnglish
Article number24
JournalSN Computer Science
Volume7
Issue number1
DOIs
StatePublished - Jan 2026

Funding

This work was supported by the U.S. National Science Foundation under Award OAC-2451577. This research was supported by internal awards from Saint Louis University (Grant-0001651 and PROJ-000498), as well as by the U.S. National Science Foundation under Award OAC-2451577. For computational resources, this work utilized the compute node at the Innovative Computing Laboratory of the University of Tennessee, Knoxville, the Fugaku supercomputer at RIKEN, the Polaris supercomputer at Argonne National Laboratory, and the Frontier supercomputer at Oak Ridge National Laboratory.

Keywords

  • General matrix multiply
  • High-performance computing
  • Mixed precision
  • Task-based runtime

Fingerprint

Dive into the research topics of 'High-Performance Mixed-Precision Matrix Multiplication via Tile-Centric Design on Modern Architectures'. Together they form a unique fingerprint.

Cite this