Abstract
Background: Current multi-petaflop supercomputers are powerful systems, but present challenges when faced with problems requiring large machine learning workflows. Complex algorithms running at system scale, often with different patterns that require disparate software packages and complex data flows cause difficulties in assembling and managing large experiments on these machines. Results: This paper presents a workflow system that makes progress on scaling machine learning ensembles, specifically in this first release, ensembles of deep neural networks that address problems in cancer research across the atomistic, molecular and population scales. The initial release of the application framework that we call CANDLE/Supervisor addresses the problem of hyper-parameter exploration of deep neural networks. Conclusions: Initial results demonstrating CANDLE on DOE systems at ORNL, ANL and NERSC (Titan, Theta and Cori, respectively) demonstrate both scaling and multi-platform execution.
Original language | English |
---|---|
Article number | 491 |
Journal | BMC Bioinformatics |
Volume | 19 |
DOIs | |
State | Published - Dec 21 2018 |
Externally published | Yes |
Funding
This material is based upon work supported by the U.S. Department of Energy, Office of Science, under contract number DE-AC02-06CH11357. This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This material is based upon work supported by the NIH (R01GM115839). Publication costs were funded by the CANDLE project under the Exascale Computing Project, a U.S. Department of Energy program.
Funders | Funder number |
---|---|
DOE Office of Science | |
U.S. Department of Energy Office of Science | |
National Institutes of Health | |
U.S. Department of Energy | |
National Institute of General Medical Sciences | R01GM115839 |
Office of Science | 17-SC-20-SC, DE-AC02-06CH11357 |
National Nuclear Security Administration |
Keywords
- Article
- Author
- Sample