TY - GEN
T1 - Communication characterization and optimization of applications using topology-aware task mapping on large supercomputers
AU - Sreepathi, Sarat
AU - D'Azevedo, Ed
AU - Philip, Bobby
AU - Worley, Patrick
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/3/12
Y1 - 2016/3/12
N2 - On large supercomputers, the job scheduling systems may assign a non-contiguous node allocation for user applications depending on available resources. With parallel applications using MPI (Message Passing Interface), the default process ordering does not take into account the actual physical node layout available to the application. This contributes to non-locality in terms of physical network topology and impacts communication performance of the application. In order to mitigate such performance penalties, this work describes techniques to identify suitable task mapping that takes the layout of the allocated nodes as well as the application's communication behavior into account. During the first phase of this research, we instrumented and collected performance data to characterize communication behavior of critical US DOE (United States - Department of Energy) applications using an augmented version of the mpiP tool. Subsequently, we developed several reordering methods (spectral bisection, neighbor join tree etc.) to combine node layout and application communication data for optimized task placement. We developed a tool called mpiAproxy to facilitate detailed evaluation of the various reordering algorithms without requiring full application executions. This work presents a comprehensive performance evaluation (14,000 experiments) of the various task mapping techniques in lowering communication costs on Titan, the leadership class supercomputer at Oak Ridge National Laboratory.
AB - On large supercomputers, the job scheduling systems may assign a non-contiguous node allocation for user applications depending on available resources. With parallel applications using MPI (Message Passing Interface), the default process ordering does not take into account the actual physical node layout available to the application. This contributes to non-locality in terms of physical network topology and impacts communication performance of the application. In order to mitigate such performance penalties, this work describes techniques to identify suitable task mapping that takes the layout of the allocated nodes as well as the application's communication behavior into account. During the first phase of this research, we instrumented and collected performance data to characterize communication behavior of critical US DOE (United States - Department of Energy) applications using an augmented version of the mpiP tool. Subsequently, we developed several reordering methods (spectral bisection, neighbor join tree etc.) to combine node layout and application communication data for optimized task placement. We developed a tool called mpiAproxy to facilitate detailed evaluation of the various reordering algorithms without requiring full application executions. This work presents a comprehensive performance evaluation (14,000 experiments) of the various task mapping techniques in lowering communication costs on Titan, the leadership class supercomputer at Oak Ridge National Laboratory.
KW - Communication characterization
KW - Reordering algorithms
KW - Topology-aware optimization
UR - http://www.scopus.com/inward/record.url?scp=85020214431&partnerID=8YFLogxK
U2 - 10.1145/2851553.2851575
DO - 10.1145/2851553.2851575
M3 - Conference contribution
AN - SCOPUS:85020214431
T3 - ICPE 2016 - Proceedings of the 7th ACM/SPEC International Conference on Performance Engineering
SP - 225
EP - 236
BT - ICPE 2016 - Proceedings of the 7th ACM/SPEC International Conference on Performance Engineering
PB - Association for Computing Machinery, Inc
T2 - 7th ACM/SPEC International Conference on Performance Engineering, ICPE 2016
Y2 - 12 March 2016 through 16 March 2016
ER -