CANDLE/Supervisor: A workflow framework for machine learning applied to cancer research

Justin M. Wozniak, Rajeev Jain, Prasanna Balaprakash, Jonathan Ozik, Nicholson T. Collier, John Bauer, Fangfang Xia, Thomas Brettin, Rick Stevens, Jamaludin Mohd-Yusof, Cristina Garcia Cardona, Brian Van Essen, Matthew Baughman

Research output: Contribution to journalArticlepeer-review

55 Scopus citations

Abstract

Background: Current multi-petaflop supercomputers are powerful systems, but present challenges when faced with problems requiring large machine learning workflows. Complex algorithms running at system scale, often with different patterns that require disparate software packages and complex data flows cause difficulties in assembling and managing large experiments on these machines. Results: This paper presents a workflow system that makes progress on scaling machine learning ensembles, specifically in this first release, ensembles of deep neural networks that address problems in cancer research across the atomistic, molecular and population scales. The initial release of the application framework that we call CANDLE/Supervisor addresses the problem of hyper-parameter exploration of deep neural networks. Conclusions: Initial results demonstrating CANDLE on DOE systems at ORNL, ANL and NERSC (Titan, Theta and Cori, respectively) demonstrate both scaling and multi-platform execution.

Original languageEnglish
Article number491
JournalBMC Bioinformatics
Volume19
DOIs
StatePublished - Dec 21 2018
Externally publishedYes

Funding

This material is based upon work supported by the U.S. Department of Energy, Office of Science, under contract number DE-AC02-06CH11357. This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This material is based upon work supported by the NIH (R01GM115839). Publication costs were funded by the CANDLE project under the Exascale Computing Project, a U.S. Department of Energy program.

FundersFunder number
DOE Office of Science
U.S. Department of Energy Office of Science
National Institutes of Health
U.S. Department of Energy
National Institute of General Medical SciencesR01GM115839
Office of Science17-SC-20-SC, DE-AC02-06CH11357
National Nuclear Security Administration

    Keywords

    • Article
    • Author
    • Sample

    Fingerprint

    Dive into the research topics of 'CANDLE/Supervisor: A workflow framework for machine learning applied to cancer research'. Together they form a unique fingerprint.

    Cite this