Abstract
The implementation and performance of a class of divide-and-conquer algorithms for computing the spectral decomposition of nonsymmetric matrices on distributed memory parallel computers are studied in this paper. After presenting a general framework, we focus on a spectral divide-and-conquer (SDC) algorithm with Newton iteration. Although the algorithm requires several times as many floating point operations as the best serial QR algorithm, it can be simply constructed from a small set of highly parallelizable matrix building blocks within Level 3 basic linear algebra subroutines (BLAS). Efficient implementations of these building blocks are available on a wide range of machines. In some ill-conditioned cases, the algorithm may lose numerical stability, but this can easily be detected and compensated for. The algorithm reached 31% efficiency with respect to the underlying PUMMA matrix multiplication and 82% efficiency with respect to the underlying ScaLAPACK matrix inversion on a 256 processor Intel Touchstone Delta system, and 41% efficiency with respect to the matrix multiplication in CMSSL on a 32 node Thinking Machines CM-5 with vector units. Our performance model predicts the performance reasonably accurately. To take advantage of the geometric nature of SDC algorithms, we have designed a graphical user interface to let the user choose the spectral decomposition according to specified regions in the complex plane.
Original language | English |
---|---|
Pages (from-to) | 1446-1461 |
Number of pages | 16 |
Journal | SIAM Journal on Scientific Computing |
Volume | 18 |
Issue number | 5 |
DOIs | |
State | Published - Sep 1997 |
Keywords
- Eigenvalue problem
- Invariant subspaces
- Nonsymmetric matrices
- Parallelizable
- ScaLAPACK
- Spectral decomposition
- Spectral divide-and-conquer