TY - GEN
T1 - Durango
T2 - 5th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, SIGSIM-PADS 2017
AU - Carothers, Christopher D.
AU - Vetter, Jeffrey S.
AU - Meredith, Jeremy S.
AU - Mubarak, Misbah
AU - Moore, Shirley
AU - Blanco, Mark P.
AU - Lapre, Justin
N1 - Publisher Copyright:
© 2017 Copyright held by the owner/author(s).
PY - 2017/5/16
Y1 - 2017/5/16
N2 - Performance modeling of extreme-scale applications on accurate representations of potential architectures is critical for designing next generation supercomputing systems because it is impractical to construct prototype systems at scale with new network hardware in order to explore designs and policies. However, these simulations often rely on static application traces that can be difficult to work with because of their size and lack of flexibility to extend or scale up without rerunning the original application. To address this problem, we have created a new technique for generating scalable, flexible workloads from real applications, we have implemented a prototype, called Durango, that combines a proven analytical performance modeling language, Aspen, with the massively parallel HPC network modeling capabilities of the CODES framework. Our models are compact, parameterized and representative of real applications with computation events. They are not resource intensive to create and are portable across simulator environments. We demonstrate the utility of Durango by simulating the LULESH application in the CODES simulation environment on several topologies and show that Durango is practical to use for simulation without loss of fidelity, as quantified by simulation metrics. During our validation of Durango's generated communication model of LULESH, we found that the original LULESH miniapp code had a latent bug where the MPI-Waitall operation was used incorrectly. This finding underscores the potential need for a tool such as Durango, beyond its benefits for flexible workload generation and modeling. Additionally, we demonstrate the efficacy of Durango's direct integration approach, which links Aspen into CODES as part of the running network simulation model. Here, Aspen generates the application-level computation timing events, which in turn drive the start of a network communication phase. Results show that Durango's performance scales well when executing both torus and dragonfly network models on up to 4K Blue Gene/Q nodes using 32K MPI ranks, Du-rango also avoids the overheads and complexities associated with extreme-scale trace files.
AB - Performance modeling of extreme-scale applications on accurate representations of potential architectures is critical for designing next generation supercomputing systems because it is impractical to construct prototype systems at scale with new network hardware in order to explore designs and policies. However, these simulations often rely on static application traces that can be difficult to work with because of their size and lack of flexibility to extend or scale up without rerunning the original application. To address this problem, we have created a new technique for generating scalable, flexible workloads from real applications, we have implemented a prototype, called Durango, that combines a proven analytical performance modeling language, Aspen, with the massively parallel HPC network modeling capabilities of the CODES framework. Our models are compact, parameterized and representative of real applications with computation events. They are not resource intensive to create and are portable across simulator environments. We demonstrate the utility of Durango by simulating the LULESH application in the CODES simulation environment on several topologies and show that Durango is practical to use for simulation without loss of fidelity, as quantified by simulation metrics. During our validation of Durango's generated communication model of LULESH, we found that the original LULESH miniapp code had a latent bug where the MPI-Waitall operation was used incorrectly. This finding underscores the potential need for a tool such as Durango, beyond its benefits for flexible workload generation and modeling. Additionally, we demonstrate the efficacy of Durango's direct integration approach, which links Aspen into CODES as part of the running network simulation model. Here, Aspen generates the application-level computation timing events, which in turn drive the start of a network communication phase. Results show that Durango's performance scales well when executing both torus and dragonfly network models on up to 4K Blue Gene/Q nodes using 32K MPI ranks, Du-rango also avoids the overheads and complexities associated with extreme-scale trace files.
KW - Hpc networks models
KW - Massively parallel simulation
KW - Structural analytic models
UR - http://www.scopus.com/inward/record.url?scp=85020696114&partnerID=8YFLogxK
U2 - 10.1145/3064911.3064923
DO - 10.1145/3064911.3064923
M3 - Conference contribution
AN - SCOPUS:85020696114
T3 - SIGSIM-PADS 2017 - Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation
SP - 97
EP - 108
BT - SIGSIM-PADS 2017 - Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation
PB - Association for Computing Machinery, Inc
Y2 - 24 May 2017 through 26 May 2017
ER -