TY - JOUR
T1 - Toucan
T2 - A performance portable, scalable implementation of the DECA algorithm
AU - Stump, Benjamin C.
AU - Arndt, Daniel
AU - Rolchigo, Matt
AU - Reeve, Samuel Temple
N1 - Publisher Copyright:
© 2025 Elsevier B.V.
PY - 2025/3
Y1 - 2025/3
N2 - In the field of additive manufacturing (AM), cellular automata (CA) is extensively used to simulate microstructural evolution during solidification. However, while traditional CA approaches are relatively fast, they still require a substantial number of time steps, are limited to moderate volumes, and are relatively difficult to improve through parallelism due to the highly localized nature of the solidification front. To address these issues of time to solution and load balancing, we introduce Toucan, a parallel, performance-portable, and scalable code written in C++ with the Kokkos library that leverages the discrete event inspired cellular automata (DECA) algorithm to perform parallel-in-time (PinT) grain growth simulations. Toucan effectively mitigates load balancing issues by distributing the computational workload more evenly across processors, enhancing scalability and efficiency. We conduct both strong and weak scaling studies on up to 64 GPUs on the Frontier supercomputer, demonstrating that Toucan significantly outperforms the current state-of-the-art, time-stepped CA code, ExaCA, on both single and multi-GPU simulations. Even in AM-specific weak scaling scenarios, Toucan maintains near-ideal scaling, in contrast to the linear increase observed with ExaCA due to the moving laser raster pattern. This study highlights Toucan's potential to transform microstructural simulations in AM by radically improving both efficiency and scalability over existing methods.
AB - In the field of additive manufacturing (AM), cellular automata (CA) is extensively used to simulate microstructural evolution during solidification. However, while traditional CA approaches are relatively fast, they still require a substantial number of time steps, are limited to moderate volumes, and are relatively difficult to improve through parallelism due to the highly localized nature of the solidification front. To address these issues of time to solution and load balancing, we introduce Toucan, a parallel, performance-portable, and scalable code written in C++ with the Kokkos library that leverages the discrete event inspired cellular automata (DECA) algorithm to perform parallel-in-time (PinT) grain growth simulations. Toucan effectively mitigates load balancing issues by distributing the computational workload more evenly across processors, enhancing scalability and efficiency. We conduct both strong and weak scaling studies on up to 64 GPUs on the Frontier supercomputer, demonstrating that Toucan significantly outperforms the current state-of-the-art, time-stepped CA code, ExaCA, on both single and multi-GPU simulations. Even in AM-specific weak scaling scenarios, Toucan maintains near-ideal scaling, in contrast to the linear increase observed with ExaCA due to the moving laser raster pattern. This study highlights Toucan's potential to transform microstructural simulations in AM by radically improving both efficiency and scalability over existing methods.
KW - Additive manufacturing
KW - Cellular automata
KW - Discrete-event
KW - Microstructure simulation
KW - Parallel-in-time
UR - http://www.scopus.com/inward/record.url?scp=85216860918&partnerID=8YFLogxK
U2 - 10.1016/j.commatsci.2025.113684
DO - 10.1016/j.commatsci.2025.113684
M3 - Article
AN - SCOPUS:85216860918
SN - 0927-0256
VL - 251
JO - Computational Materials Science
JF - Computational Materials Science
M1 - 113684
ER -