Parallel matrix transpose algorithms on distributed memory concurrent computers

Jaeyoung Choi, Jack J. Dongarra, David W. Walker

Research output: Contribution to journalArticlepeer-review

25 Scopus citations

Abstract

This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P × Q processor template with a block cyclic data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The communication schemes of the algorithms are determined by the greatest common divisor (GCD) of P and Q. If P and Q are relatively prime, the matrix transpose algorithm involves complete exchange communication. If P and Q are not relatively prime, processors are divided into GCD groups and the communication operations are overlapped for different groups of processors. Processors transpose GCD wrapped diagonal blocks simultaneously, and the matrix can be transposed with LCM/GCD steps, where LCM is the least common multiple of P and Q. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C = A · B, the algorithms are used to compute parallel multiplications of transposed matrices, C = AT · BT, in the PUMMA package [5]. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.

Original languageEnglish
Pages (from-to)1387-1405
Number of pages19
JournalParallel Computing
Volume21
Issue number9
DOIs
StatePublished - Sep 1995

Funding

* Research was supported by the Applied Mathematical Sciences Research Program of the Office of Energy Research, U.S. Department of Energy, by the Defense Advanced Research Projects Agency under contract DAAL03-91-C-0047, administered by the Army Research Office, and by the Center for Research on Parallel Computing. * Communicating author. Email: [email protected].

FundersFunder number
Center for Research on Parallel Computing
Office of Energy Research
U.S. Department of Energy
Army Research Office
Defense Advanced Research Projects AgencyDAAL03-91-C-0047

    Keywords

    • Distributed memory multiprocessors
    • Intel Touchstone Delta
    • Linear algebra
    • Matrix transpose algorithm
    • Point-to-point communication

    Fingerprint

    Dive into the research topics of 'Parallel matrix transpose algorithms on distributed memory concurrent computers'. Together they form a unique fingerprint.

    Cite this