Leveraging Hardware-Aware Computation in Mixed-Precision Matrix Multiply: A Tile-Centric Approach

  • Qiao Zhang
  • , Rabab Alomairy
  • , Dali Wang
  • , Zhuowei Gu
  • , Qinglei Cao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

General Matrix Multiplication (GEMM) is a critical operation underpinning a wide range of applications in high-performance computing (HPC) and artificial intelligence (AI). The emergence of hardware optimized for low-precision arithmetic necessitates a reevaluation of numerical algorithms to leverage mixed-precision computations, achieving improved performance and energy efficiency. This research introduces an adaptive mixed-precision GEMM framework that supports different precision formats at fine-grained tile/block levels. We utilize the PaRSEC runtime system to balance workloads across various architectures. The performance scales well on ARM CPU-based Fugaku supercomputer, Nvidia GPU-based A100 DGX, and AMD GPU-based Frontier supercomputer. This research aims to enhance computational efficiency and accuracy by bridging algorithmic advancements and hardware innovations, driving transformative progress in various applications.

Original languageEnglish
Title of host publicationAsynchronous Many-Task Systems and Applications - 3rd International Workshop, WAMTA 2025, Proceedings
EditorsPatrick Diehl, Qinglei Cao, Thomas Herault, George Bosilca
PublisherSpringer Science and Business Media Deutschland GmbH
Pages174-185
Number of pages12
ISBN (Print)9783031971952
DOIs
StatePublished - 2026
Event3rd International Workshop on Asynchronous Many-Task Systems and Applications, WAMTA 2025 - St. Louis, United States
Duration: Feb 19 2025Feb 21 2025

Publication series

NameLecture Notes in Computer Science
Volume15690 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference3rd International Workshop on Asynchronous Many-Task Systems and Applications, WAMTA 2025
Country/TerritoryUnited States
CitySt. Louis
Period02/19/2502/21/25

Funding

This research was supported by internal awards from Saint Louis University (Grant-0001651 and PROJ-000498) and the U.S. National Science Foundation (Award OAC-2451577). For computer time, this research used the Lonestar6 cluster from Texas Advanced Computing Center, the compute node at Innovative Computing Laboratory of the University of Tennessee, Knoxville, the Fugaku supercomputer at RIKEN, and Frontier supercomputer at Oak Ridge National Laboratory.

Keywords

  • General matrix multiply
  • High-performance computing
  • Mixed precision
  • Task-based runtime

Fingerprint

Dive into the research topics of 'Leveraging Hardware-Aware Computation in Mixed-Precision Matrix Multiply: A Tile-Centric Approach'. Together they form a unique fingerprint.

Cite this