TY - JOUR
T1 - Pumma
T2 - Parallel universal matrix multiplication algorithms on distributed memory concurrent computers
AU - Choi, Jaeyoung
AU - Walker, David W.
AU - Dongarra, Jack J.
PY - 1994/10
Y1 - 1994/10
N2 - The paper describes Parallel Universal Matrix Multiplication Algorithms (PUMMA) on distributed memory concurrent computers. The PUMMA package includes not only the non‐transposed matrix multiplication routine C = A ⋅ B, but also transposed multiplication routines C = AT ⋅ B, C = A ⋅ BT, and C = AT ⋅ BT, for a block cyclic data distribution. The routines perform efficiently for a wide range of processor configurations and block sizes. The PUMMA together provide the same functionality as the Level 3 BLAS routine xGEMM. Details of the parallel implementation of the routines are given, and results are presented for runs on the Intel Touchstone Delta computer.
AB - The paper describes Parallel Universal Matrix Multiplication Algorithms (PUMMA) on distributed memory concurrent computers. The PUMMA package includes not only the non‐transposed matrix multiplication routine C = A ⋅ B, but also transposed multiplication routines C = AT ⋅ B, C = A ⋅ BT, and C = AT ⋅ BT, for a block cyclic data distribution. The routines perform efficiently for a wide range of processor configurations and block sizes. The PUMMA together provide the same functionality as the Level 3 BLAS routine xGEMM. Details of the parallel implementation of the routines are given, and results are presented for runs on the Intel Touchstone Delta computer.
UR - http://www.scopus.com/inward/record.url?scp=0028530654&partnerID=8YFLogxK
U2 - 10.1002/cpe.4330060702
DO - 10.1002/cpe.4330060702
M3 - Article
AN - SCOPUS:0028530654
SN - 1040-3108
VL - 6
SP - 543
EP - 570
JO - Concurrency Practice and Experience
JF - Concurrency Practice and Experience
IS - 7
ER -