Distributing Simplex-Shaped Nested for-Loops to Identify Carcinogenic Gene Combinations

Sajal Dash, Mohammad Alaul Haque Monil, Junqi Yin, Ramu Anandakrishnan, Feiyi Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Cancer is a leading cause of death in the US, and it results from a combination of two-nine genetic mutations. Identifying five-hit combinations responsible for several cancer types is computationally intractable even with the fastest super-computers in the USA. Iterating through nested loops required by the process presents a simplex-shaped workload with irregular memory access patterns. Distributing this workload efficiently across thousands of GPUs offers a challenge in dividing simplex-shaped (triangular/tetrahedral) workload into similar shapes with equal volume. Irregular memory access patterns create imbalanced compute utilization across nodes. We developed a generalized solution for distributing a simplex-shaped workload by partially coalescing the nested for-loops, minimizing the memory access overhead by efficiently utilizing limited shared memory, a dynamic scheduler, and loop tiling. For 4-hit combinations, we achieved a 90% - 100% strong scaling efficiency for up to 3594 V100 GPUs on the Summit supercomputer. Finally, we designed and implemented a distributed algorithm to identify 5-hit combinations for four different cancer types, and the identified combinations can differentiate between cancer and normal samples with 86.59-88.79% precision and 84.42 - 90.91% recall. We also demonstrated the robustness of our solution by porting the code to another leadership class computing platform Crusher, a testbed for the fastest supercomputer Frontier. On Crusher, we achieved 98% strong scaling efficiency on 50 nodes (400 AMD MI250X GCDs) and demonstrated the computational readiness of Frontier for scientific applications.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages974-984
Number of pages11
ISBN (Electronic)9798350337662
DOIs
StatePublished - 2023
Event37th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023 - St. Petersburg, United States
Duration: May 15 2023May 19 2023

Publication series

NameProceedings - 2023 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023

Conference

Conference37th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023
Country/TerritoryUnited States
CitySt. Petersburg
Period05/15/2305/19/23

Funding

This work was supported by the resources of the Oak Ridge Leadership Computing Facility, located in the National Center for Computational Sciences at ORNL, which is managed by UT Battelle, LLC for the U.S. DOE (under the contract No. DE-AC05-00OR22725).

FundersFunder number
U.S. Department of EnergyDE-AC05-00OR22725

    Keywords

    • Cancer genomics
    • nested loops
    • scheduler
    • simplex

    Fingerprint

    Dive into the research topics of 'Distributing Simplex-Shaped Nested for-Loops to Identify Carcinogenic Gene Combinations'. Together they form a unique fingerprint.

    Cite this