Abstract
Whole Slide Imaging (WSI) captures microscopic details of a patient's histopathological features at multiple res-olutions organized across different levels. Images produced by WSI are gigapixel-sized, and saving a single image in memory requires a few gigabytes which is scarce since a complicated model occupies tens of gigabytes. Performing a simple met-ric operation on these large images is also expensive. High-performance computing (HPC) can help us quickly analyze such large images using distributed training of complex deep learning models. One popular approach in analyzing these images is to divide a WSI image into smaller tiles (patches) and then train a simpler model with these reduced-sized but large numbers of patches. However, we need to solve three pre-processing challenges efficiently for pursuing this patch-based approach. 1) Creating small patches from a high-resolution image can result in a high number (hundreds of thousands per image) of patches. Storing and processing these images can be challenging due to a large number of I/O and arithmetic operations. To reduce I/Oand memory accesses, an optimal balance between the size and number of patches must exist to reduce I/O and memory accesses. 2) WSI images may have tiny annotated regions for cancer tissue and a significant portion with normal and fatty tissues; correct patch sampling should avoid dataset imbalance. 3) storing and retrieving many patches to and from disk storage might incur I/O latency while training a deep learning model. An efficient distributed data loader should reduce I/O latency during the training and inference steps. This paper explores these three challenges and provides empirical and algorithmic solutions deployed on the Summit supercomputer hosted at the Oak Ridge Leadership Computing Facility.
Original language | English |
---|---|
Title of host publication | Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 1266-1274 |
Number of pages | 9 |
ISBN (Electronic) | 9781665497473 |
DOIs | |
State | Published - 2022 |
Event | 36th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022 - Virtual, Online, France Duration: May 30 2022 → Jun 3 2022 |
Publication series
Name | Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022 |
---|
Conference
Conference | 36th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022 |
---|---|
Country/Territory | France |
City | Virtual, Online |
Period | 05/30/22 → 06/3/22 |
Funding
This manuscript has been authored by UT-Battelle LLC under Contract No. DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of the manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). ACKNOWLEDGEMENTS This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Keywords
- WSI
- cancer image
- database
- image analysis
- lmdb
- pipeline
- sampling