Scaling out a combinatorial algorithm for discovering carcinogenic gene combinations to thousands of GPUs

Sajal Dash, Qais Al-Hajri, Wu Chun Feng, Harold R. Garner, Ramu Anandakrishnan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Cancer is a leading cause of death in the US, second only to heart disease. It is primarily a result of a combination of an estimated two-nine genetic mutations (multi-hit combinations). Although a body of research has identified hundreds of cancer-causing genetic mutations, we don't know the specific combination of mutations responsible for specific instances of cancer for most cancer types. An approximate algorithm for solving the weighted set cover problem was previously adapted to identify combinations of genes with mutations that may be responsible for individual instances of cancer. However, the algorithm's computational requirement scales exponentially with the number of genes, making it impractical for identifying more than three-hit combinations, even after the algorithm was parallelized and scaled up to a V100 GPU. Since most cancers have been estimated to require more than three hits, we scaled out the algorithm to identify combinations of four or more hits using 1000 nodes (6000 V100 GPUs with ≈ 48× 106 processing cores) on the Summit supercomputer at Oak Ridge National Laboratory. Efficiently scaling out the algorithm required a series of algorithmic innovations and optimizations for balancing an exponentially divergent workload across processors and for minimizing memory latency and inter-node communication. We achieved an average strong scaling efficiency of 90.14% (80.96%-97.96% for 200 to 1000 nodes), compared to a 100 node run, with 84.18% scaling efficiency for 1000 nodes. With experimental validation, the multi-hit combinations identified here could provide further insight into the etiology of different cancer subtypes and provide a rational basis for targeted combination therapy.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE 35th International Parallel and Distributed Processing Symposium, IPDPS 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages837-846
Number of pages10
ISBN (Electronic)9781665440660
DOIs
StatePublished - May 2021
Event35th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2021 - Virtual, Online
Duration: May 17 2021May 21 2021

Publication series

NameProceedings - 2021 IEEE 35th International Parallel and Distributed Processing Symposium, IPDPS 2021

Conference

Conference35th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2021
CityVirtual, Online
Period05/17/2105/21/21

Funding

This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research was funded by the Edward Via College of Osteopathic Medicine (VCOM) FY20 REAP grant No. 10333

FundersFunder number
Edward Via College of Osteopathic Medicine
VCOM10333
U.S. Department of EnergyDE-AC05-00OR22725
Office of Science

    Keywords

    • Cancer genomics
    • GPU
    • Parallel computing
    • Set Cover algorithm

    Fingerprint

    Dive into the research topics of 'Scaling out a combinatorial algorithm for discovering carcinogenic gene combinations to thousands of GPUs'. Together they form a unique fingerprint.

    Cite this