TY - GEN
T1 - Adaptive Patching for High-resolution Image Segmentation with Transformers
AU - Zhang, Enzhi
AU - Lyngaas, Isaac
AU - Chen, Peng
AU - Wang, Xiao
AU - Igarashi, Jun
AU - Huo, Yuankai
AU - Munetomo, Masaharu
AU - Wahib, Mohamed
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Attention-based models are proliferating in the space of image analytics, including segmentation. The standard method of feeding images to transformer encoders is to divide the images into patches and then feed the patches to the model as a linear sequence of tokens. For high-resolution images, e.g. microscopic pathology images, the quadratic compute and memory cost prohibits the use of an attention-based model, if we are to use smaller patch sizes that are favorable in segmentation. The solution is to either use custom complex multi-resolution models or approximate attention schemes. We take inspiration from Adapative Mesh Refinement (AMR) methods in HPC by adaptively patching the images, as a pre-processing step, based on the image details to reduce the number of patches being fed to the model, by orders of magnitude. This method has a negligible overhead, and works seamlessly with any attention-based model, i.e. it is a pre-processing step that can be adopted by any attention-based model without friction. We demonstrate superior segmentation quality over SoTA segmentation models for real-world pathology datasets while gaining a geomean speedup of 6.9 × for resolutions up to 64 K2, on up to 2,048 GPUs.
AB - Attention-based models are proliferating in the space of image analytics, including segmentation. The standard method of feeding images to transformer encoders is to divide the images into patches and then feed the patches to the model as a linear sequence of tokens. For high-resolution images, e.g. microscopic pathology images, the quadratic compute and memory cost prohibits the use of an attention-based model, if we are to use smaller patch sizes that are favorable in segmentation. The solution is to either use custom complex multi-resolution models or approximate attention schemes. We take inspiration from Adapative Mesh Refinement (AMR) methods in HPC by adaptively patching the images, as a pre-processing step, based on the image details to reduce the number of patches being fed to the model, by orders of magnitude. This method has a negligible overhead, and works seamlessly with any attention-based model, i.e. it is a pre-processing step that can be adopted by any attention-based model without friction. We demonstrate superior segmentation quality over SoTA segmentation models for real-world pathology datasets while gaining a geomean speedup of 6.9 × for resolutions up to 64 K2, on up to 2,048 GPUs.
UR - http://www.scopus.com/inward/record.url?scp=85214982143&partnerID=8YFLogxK
U2 - 10.1109/SC41406.2024.00082
DO - 10.1109/SC41406.2024.00082
M3 - Conference contribution
AN - SCOPUS:85214982143
T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
BT - Proceedings of SC 2024
PB - IEEE Computer Society
T2 - 2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024
Y2 - 17 November 2024 through 22 November 2024
ER -