Pumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers

Jaeyoung Choi, David W. Walker, Jack J. Dongarra

Research output: Contribution to journalArticlepeer-review

106 Scopus citations

Abstract

The paper describes Parallel Universal Matrix Multiplication Algorithms (PUMMA) on distributed memory concurrent computers. The PUMMA package includes not only the non‐transposed matrix multiplication routine C = A ⋅ B, but also transposed multiplication routines C = AT ⋅ B, C = A ⋅ BT, and C = AT ⋅ BT, for a block cyclic data distribution. The routines perform efficiently for a wide range of processor configurations and block sizes. The PUMMA together provide the same functionality as the Level 3 BLAS routine xGEMM. Details of the parallel implementation of the routines are given, and results are presented for runs on the Intel Touchstone Delta computer.

Original languageEnglish
Pages (from-to)543-570
Number of pages28
JournalConcurrency Practice and Experience
Volume6
Issue number7
DOIs
StatePublished - Oct 1994
Externally publishedYes

Fingerprint

Dive into the research topics of 'Pumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers'. Together they form a unique fingerprint.

Cite this