Abstract
Gigapixel images are prevalent in scientific domains ranging from remote sensing, and satellite imagery to microscopy, etc. However, training a deep learning model at the natural resolution of those images has been a challenge in terms of both, overcoming the resource limit (e.g. HBM memory constraints), as well as scaling up to a large number of GPUs. In this paper, we trained Residual neural Networks (ResNet) on 22,528 x 22,528-pixel size images using a distributed spatial decomposition method on 2,304 GPUs on the Summit Supercomputer. We applied our method on a Whole Slide Imaging (WSI) dataset from The Cancer Genome Atlas (TCGA) database. WSI images can be in the size of 100,000 x 100,000 pixels or even larger, and in this work we studied the effect of image resolution on a classification task, while achieving state-of-the-art AUC scores. Moreover, our approach doesn't need pixel-level labels, since we're avoiding patching from the WSI images completely, while adding the capability of training arbitrary large-size images. This is achieved through a distributed spatial decomposition method, by leveraging the non-block fat-tree interconnect network of the Summit architecture, which enabled GPU-to-GPU direct communication. Finally, detailed performance analysis results are shown, as well as a comparison with a data-parallel approach when possible.
Original language | English |
---|---|
Title of host publication | Proceedings of the Platform for Advanced Scientific Computing Conference, PASC 2023 |
Publisher | Association for Computing Machinery, Inc |
ISBN (Electronic) | 9798400701900 |
DOIs | |
State | Published - Jun 26 2023 |
Event | 2023 Platform for Advanced Scientific Computing Conference, PASC 2023 - Davos, Switzerland Duration: Jun 26 2023 → Jun 28 2023 |
Publication series
Name | Proceedings of the Platform for Advanced Scientific Computing Conference, PASC 2023 |
---|
Conference
Conference | 2023 Platform for Advanced Scientific Computing Conference, PASC 2023 |
---|---|
Country/Territory | Switzerland |
City | Davos |
Period | 06/26/23 → 06/28/23 |
Funding
This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
Keywords
- convolutional neural networks
- distributed deep learning
- medical imaging
- model parallelism
- spatial decomposition