A scalable, parallel algorithm for maximal clique enumeration

Matthew C. Schmidt, Nagiza F. Samatova, Kevin Thomas, Byung Hoon Park

Research output: Contribution to journalArticlepeer-review

120 Scopus citations

Abstract

The problem of maximal clique enumeration (MCE) is to enumerate all of the maximal cliques in a graph. Once enumerated, maximal cliques are widely used to solve problems in areas such as 3-D protein structure alignment, genome mapping, gene expression analysis, and detection of social hierarchies. Even the most efficient serial MCE algorithms require large amounts of time to enumerate the maximal cliques in networks arising from these problems that contain hundreds, thousands, or larger numbers of vertices. The previous attempts to provide practical solutions to the MCE problem through parallel implementation have had limited success, largely due to a number of challenges inherent to the nature of the MCE combinatorial search space. On the one hand, MCE algorithms often create a backtracking search tree that has a highly irregular and hard-or-impossible to predict structure; therefore, almost any static decomposition of the search tree by parallel processors results in highly unbalanced processor execution times. On the other hand, the data-intensive nature of the MCE problem often makes naive dynamic load distribution strategies that require extensive data movement prohibitively expensive. As a result, good scaling of the overall execution time of parallel MCE algorithms has been reported for only up to a couple hundred processors. In this paper, we propose a parallel, scalable, and memory-efficient MCE algorithm for distributed and/or shared memory high performance computing architectures, whose runtime scales linearly for thousands of processors on real-world application graphs with hundreds and thousands of nodes. Its scalability and efficiency are attributed to the proposed: (a) representation of the search tree decomposition to enable parallelization; (b) parallel depth-first backtracking search to both constrain the search space and minimize memory requirement; (c) least stringent synchronization to minimize data movement; and (d) on-demand work stealing intelligently coupled with work stack splitting to minimize computing elements' idle time. To the best of our knowledge, the proposed parallel MCE algorithm is the first to achieve a linear scaling runtime using up to 2048 processors on Cray XT machines for a number of real-world biological networks.

Original languageEnglish
Pages (from-to)417-428
Number of pages12
JournalJournal of Parallel and Distributed Computing
Volume69
Issue number4
DOIs
StatePublished - Apr 2009

Funding

The authors are thankful to the reviewers for their insightful comments and to Cray Inc. for the access to large-scale Cray XT systems and the insights into the code optimization and benchmarks. This research has been supported by the “Exploratory Data Intensive Computing for Complex Biological Systems” project from US Department of Energy (Office of Advanced Scientific Computing Research, Office of Science). The work of MCS was also sponsored by the Dean of the Department of Computer Science at North Carolina State University. The work of NFS was also sponsored by the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory. Oak Ridge National Laboratory is managed by UT-Battelle for the LLC US D.O.E. under contract no. DEAC05-00OR22725.

Keywords

  • Biological networks
  • Cray XT
  • Dynamic load balancing
  • High-performance computing
  • Maximal clique enumeration
  • Parallel graph algorithms

Fingerprint

Dive into the research topics of 'A scalable, parallel algorithm for maximal clique enumeration'. Together they form a unique fingerprint.

Cite this