TY - GEN
T1 - OReole-FM
T2 - 32nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL 2024
AU - Dias, Philipe
AU - Tsaris, Aristeidis
AU - Bowman, Jordan
AU - Potnis, Abhishek
AU - Arndt, Jacob
AU - Yang, H. Lexie
AU - Lunga, Dalton
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/11/22
Y1 - 2024/11/22
N2 - While the pretraining of Foundation Models (FMs) for remote sensing (RS) imagery is on the rise, models remain restricted to a few hundred million parameters. Scaling models to billions of parameters has been shown to yield unprecedented benefits including emergent abilities, but requires data scaling and computing resources typically not available outside industry R&D labs. In this work, we pair high-performance computing resources including Frontier supercomputer, America’s first exascale system, and high-resolution optical RS data to pretrain billion-scale FMs. Our study assesses performance of different pretrained variants of vision Transformers across image classification, semantic segmentation and object detection benchmarks, which highlight the importance of data scaling for effective model scaling. Moreover, we discuss construction of a novel TIU pretraining dataset, model initialization, with data and pretrained models intended for public release. By discussing technical challenges and details often lacking in the related literature, this work is intended to offer best practices to the geospatial community toward efficient training and benchmarking of larger FMs.
AB - While the pretraining of Foundation Models (FMs) for remote sensing (RS) imagery is on the rise, models remain restricted to a few hundred million parameters. Scaling models to billions of parameters has been shown to yield unprecedented benefits including emergent abilities, but requires data scaling and computing resources typically not available outside industry R&D labs. In this work, we pair high-performance computing resources including Frontier supercomputer, America’s first exascale system, and high-resolution optical RS data to pretrain billion-scale FMs. Our study assesses performance of different pretrained variants of vision Transformers across image classification, semantic segmentation and object detection benchmarks, which highlight the importance of data scaling for effective model scaling. Moreover, we discuss construction of a novel TIU pretraining dataset, model initialization, with data and pretrained models intended for public release. By discussing technical challenges and details often lacking in the related literature, this work is intended to offer best practices to the geospatial community toward efficient training and benchmarking of larger FMs.
KW - Earth Observation
KW - Foundation Models
KW - High-resolution satellite imagery
KW - Remote Sensing
KW - Self-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85214729433&partnerID=8YFLogxK
U2 - 10.1145/3678717.3691292
DO - 10.1145/3678717.3691292
M3 - Conference contribution
AN - SCOPUS:85214729433
T3 - 32nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL 2024
SP - 597
EP - 600
BT - 32nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL 2024
A2 - Nascimento, Mario A.
A2 - Xiong, Li
A2 - Zufle, Andreas
A2 - Chiang, Yao-Yi
A2 - Eldawy, Ahmed
A2 - Kroger, Peer
PB - Association for Computing Machinery, Inc
Y2 - 29 October 2024 through 1 November 2024
ER -