Distributed-memory lattice H-matrix factorization

Ichitaro Yamazaki, Akihiro Ida, Rio Yokota, Jack Dongarra

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

We parallelize the LU factorization of a hierarchical low-rank matrix (H-matrix) on a distributed-memory computer. This is much more difficult than the H-matrix-vector multiplication due to the dataflow of the factorization, and it is much harder than the parallelization of a dense matrix factorization due to the irregular hierarchical block structure of the matrix. Block low-rank (BLR) format gets rid of the hierarchy and simplifies the parallelization, often increasing concurrency. However, this comes at a price of losing the near-linear complexity of the H-matrix factorization. In this work, we propose to factorize the matrix using a “lattice H-matrix” format that generalizes the BLR format by storing each of the blocks (both diagonals and off-diagonals) in the H-matrix format. These blocks stored in the linear complexity of the-matrix format are referred to as lattices. Thus, this lattice format aims to combine the parallel scalability of BLR factorization with the near-linear complexity of linear complexity of the-matrix factorization. We first compare factorization performances using the L-matrix, BLR, and lattice H-matrix formats under various conditions on a shared-memory computer. Our performance results show that the lattice format has storage and computational complexities similar to those of the H-matrix format, and hence a much lower cost of factorization than BLR. We then compare the BLR and lattice (H-matrix factorization on distributed-memory computers. Our performance results demonstrate that compared with BLR, the lattice format with the lower cost of factorization may lead to faster factorization on the distributed-memory computer.

Original languageEnglish
Pages (from-to)1046-1063
Number of pages18
JournalInternational Journal of High Performance Computing Applications
Volume33
Issue number5
DOIs
StatePublished - Sep 1 2019
Externally publishedYes

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by Joint Usage/Research Center for Interdisciplinary Large-Scale Information Infrastructures and High Performance Computing Infrastructure in Japan (Project ID: jh170057); JSPS KAKENHI under grant numbers #17K19962, #17H01749, and #18H03248; the National Science Foundation under grant no 1740250; the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration, under prime contract #DE-AC05-00OR22725; and UT Battelle subaward #4000152412.

Keywords

  • LU factorization
  • boundary element method
  • distributed memory
  • hierarchical matrix
  • task programming

Fingerprint

Dive into the research topics of 'Distributed-memory lattice H-matrix factorization'. Together they form a unique fingerprint.

Cite this