TY - GEN
T1 - A performance instrumentation framework to characterize computation-communication overlap in message-passing systems
AU - Shet, Aniruddha G.
AU - Sadayappan, P.
AU - Bernholdt, David E.
AU - Nieplocha, Jarek
AU - Tipparaju, Vinod
PY - 2006
Y1 - 2006
N2 - Effective overlap of computation and communication is a well understood technique for latency hiding and can yield significant performance gains for applications on high-end computers. In this paper, we propose an instrumentation framework for message-passing systems to characterize the degree of overlap of communication with computation in the execution of parallel applications. The inability to obtain precise time-stamps for pertinent communication events is a significant problem, and is addressed by generation of minimum and maximum bounds on achieved overlap. The overlap measures can aid application developers and system designers in investigating scalability issues. The approach has been used to instrument two MPI implementations as well as the ARMCI system. The implementation resides entirely within the communication library and thus integrates well with existing approaches that operate outside the library. The usefulness of the framework is shown by analyzing available overlap for microbenchmarks and NAS benchmarks, and the insights obtained are used to improve achieved overlap by modifying the NAS SP benchmark.
AB - Effective overlap of computation and communication is a well understood technique for latency hiding and can yield significant performance gains for applications on high-end computers. In this paper, we propose an instrumentation framework for message-passing systems to characterize the degree of overlap of communication with computation in the execution of parallel applications. The inability to obtain precise time-stamps for pertinent communication events is a significant problem, and is addressed by generation of minimum and maximum bounds on achieved overlap. The overlap measures can aid application developers and system designers in investigating scalability issues. The approach has been used to instrument two MPI implementations as well as the ARMCI system. The implementation resides entirely within the communication library and thus integrates well with existing approaches that operate outside the library. The usefulness of the framework is shown by analyzing available overlap for microbenchmarks and NAS benchmarks, and the insights obtained are used to improve achieved overlap by modifying the NAS SP benchmark.
UR - http://www.scopus.com/inward/record.url?scp=46049116864&partnerID=8YFLogxK
U2 - 10.1109/CLUSTR.2006.311887
DO - 10.1109/CLUSTR.2006.311887
M3 - Conference contribution
AN - SCOPUS:46049116864
SN - 1424403286
SN - 9781424403288
T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC
BT - 2006 IEEE International Conference on Cluster Computing, Cluster 2006
T2 - 2006 IEEE International Conference on Cluster Computing, Cluster 2006
Y2 - 25 September 2006 through 28 September 2006
ER -