TY - GEN
T1 - Exploring Vision Transformers on the Frontier Supercomputer for Remote Sensing and Geoscientific Applications
AU - Anantharaj, Valentine
AU - Kurihana, Takuya
AU - Dash, Sajal
AU - Padovani, Gabriele
AU - Fiore, Sandro
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - The earth sciences research community has an unprecedented opportunity to exploit the vast amount of data available from earth observation (EO) satellites and earth system models (ESM). The ascent and application of artificial intelligence foundation models (FM) can be attributed to the availability of large volumes of curated data, access to extensive computing resources and the maturity of deep learning techniques. Vision transformers (ViT) architectures have been adapted for image and image-like data, such as EO data and ESM simulation output. Pretraining foundation models is a compute intensive process, often requiring 105 - 107 GPU hours for large scale scientific applications. There is a limited body of knowledge on compute optimal methods for pretraining, necessitating a trial and error process. We have performed a series of experiments using ViT backbones at different scales to understand optimal and cost-effective ways to improve scientific throughput. This preliminary benchmark provides an assessment of which architectures and model configurations are favorable in a given scientific context.
AB - The earth sciences research community has an unprecedented opportunity to exploit the vast amount of data available from earth observation (EO) satellites and earth system models (ESM). The ascent and application of artificial intelligence foundation models (FM) can be attributed to the availability of large volumes of curated data, access to extensive computing resources and the maturity of deep learning techniques. Vision transformers (ViT) architectures have been adapted for image and image-like data, such as EO data and ESM simulation output. Pretraining foundation models is a compute intensive process, often requiring 105 - 107 GPU hours for large scale scientific applications. There is a limited body of knowledge on compute optimal methods for pretraining, necessitating a trial and error process. We have performed a series of experiments using ViT backbones at different scales to understand optimal and cost-effective ways to improve scientific throughput. This preliminary benchmark provides an assessment of which architectures and model configurations are favorable in a given scientific context.
KW - artificial intelligence
KW - benchmarking
KW - foundation models
KW - High performance computing
UR - http://www.scopus.com/inward/record.url?scp=85204908140&partnerID=8YFLogxK
U2 - 10.1109/IGARSS53475.2024.10640929
DO - 10.1109/IGARSS53475.2024.10640929
M3 - Conference contribution
AN - SCOPUS:85204908140
T3 - International Geoscience and Remote Sensing Symposium (IGARSS)
SP - 3085
EP - 3088
BT - IGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2024
Y2 - 7 July 2024 through 12 July 2024
ER -