TY - GEN
T1 - An empirical performance evaluation of scalable scientific applications
AU - Vetter, Jeffrey S.
AU - Yoo, Andy
N1 - Publisher Copyright:
© 2002 IEEE.
PY - 2002
Y1 - 2002
N2 - We investigate the scalability, architectural requirements, and performance characteristics of eight scalable scientific applications. Our analysis is driven by empirical measurements using statistical and tracing instrumentation for both communication and computation. Based on these measurements, we refine our analysis into precise explanations of the factors that influence performance and scalability for each application; we distill these factors into common traits and overall recommendations for both users and designers of scalable platforms. Our experiments demonstrate that some traits, such as improvements in the scaling and performance of MPI's collective operations, will benefit most applications. We also find specific characteristics of some applications that limit performance. For example, one application's intensive use of a 64-bit, floating-point divide instruction, which has high latency and is not pipelined on the POWER3, limits the performance of the application's primary computation.
AB - We investigate the scalability, architectural requirements, and performance characteristics of eight scalable scientific applications. Our analysis is driven by empirical measurements using statistical and tracing instrumentation for both communication and computation. Based on these measurements, we refine our analysis into precise explanations of the factors that influence performance and scalability for each application; we distill these factors into common traits and overall recommendations for both users and designers of scalable platforms. Our experiments demonstrate that some traits, such as improvements in the scaling and performance of MPI's collective operations, will benefit most applications. We also find specific characteristics of some applications that limit performance. For example, one application's intensive use of a 64-bit, floating-point divide instruction, which has high latency and is not pipelined on the POWER3, limits the performance of the application's primary computation.
UR - http://www.scopus.com/inward/record.url?scp=85117198273&partnerID=8YFLogxK
U2 - 10.1109/SC.2002.10036
DO - 10.1109/SC.2002.10036
M3 - Conference contribution
AN - SCOPUS:85117198273
T3 - Proceedings of the International Conference on Supercomputing
BT - Proceedings of the IEEE/ACM SC 2002 Conference, SC 2002
PB - Association for Computing Machinery
T2 - 2002 IEEE/ACM Conference on Supercomputing, SC 2002
Y2 - 16 November 2002 through 22 November 2002
ER -