Abstract
General Matrix Multiplication (GEMM) is a critical operation underpinning a wide range of applications in high-performance computing (HPC) and artificial intelligence (AI). The emergence of hardware optimized for low-precision arithmetic necessitates a reevaluation of numerical algorithms to leverage mixed-precision computations, achieving improved performance and energy efficiency. This research presents an adaptive mixed-precision GEMM framework that enables support for various precision formats at fine-grained tile and block levels, offering a reliable foundation for trustworthy mixed-precision computations. Furthermore, we leverage the PaRSEC runtime system to effectively balance workloads across diverse architectures. The performance exhibits strong scalability across both homogeneous platforms (Intel CPU-based systems and the ARM CPU-based Fugaku supercomputer) and heterogeneous systems (Nvidia V100, A100, and H100 GPU-based platforms, as well as the AMD GPU-based Frontier supercomputer). This work aims to improve computational efficiency and accuracy by bridging algorithmic innovations with hardware capabilities, fostering transformative advancements across a wide range of applications.
| Original language | English |
|---|---|
| Article number | 24 |
| Journal | SN Computer Science |
| Volume | 7 |
| Issue number | 1 |
| DOIs | |
| State | Published - Jan 2026 |
Funding
This work was supported by the U.S. National Science Foundation under Award OAC-2451577. This research was supported by internal awards from Saint Louis University (Grant-0001651 and PROJ-000498), as well as by the U.S. National Science Foundation under Award OAC-2451577. For computational resources, this work utilized the compute node at the Innovative Computing Laboratory of the University of Tennessee, Knoxville, the Fugaku supercomputer at RIKEN, the Polaris supercomputer at Argonne National Laboratory, and the Frontier supercomputer at Oak Ridge National Laboratory.
Keywords
- General matrix multiply
- High-performance computing
- Mixed precision
- Task-based runtime
Fingerprint
Dive into the research topics of 'High-Performance Mixed-Precision Matrix Multiplication via Tile-Centric Design on Modern Architectures'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver