Parallel matrix transpose algorithms on distributed memory concurrent computers

Jaeyoung Choi, J. J. Dongarra, D. W. Walker

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Scopus citations

Abstract

This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P×Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C=A·B, the algorithms are used to compute parallel multiplications of transposed matrices, C=AT·BT, in the PUMMA package. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.

Original languageEnglish
Title of host publicationProceedings of Scalable Parallel Libraries Conference, SPLC 1993
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages245-252
Number of pages8
ISBN (Electronic)0818649801, 9780818649806
DOIs
StatePublished - 1993
Event1993 Scalable Parallel Libraries Conference, SPLC 1993 - Mississippi State, United States
Duration: Oct 6 1993Oct 8 1993

Publication series

NameProceedings of Scalable Parallel Libraries Conference, SPLC 1993

Conference

Conference1993 Scalable Parallel Libraries Conference, SPLC 1993
Country/TerritoryUnited States
CityMississippi State
Period10/6/9310/8/93

Fingerprint

Dive into the research topics of 'Parallel matrix transpose algorithms on distributed memory concurrent computers'. Together they form a unique fingerprint.

Cite this