TY - GEN
T1 - Impact of quad-core cray XT4 system and software stack on scientific computation
AU - Alam, S. R.
AU - Barrett, R. F.
AU - Jagode, H.
AU - Kuehn, J. A.
AU - Poole, S. W.
AU - Sankaran, R.
PY - 2009
Y1 - 2009
N2 - An upgrade from dual-core to quad-core AMD processor on the Cray XT system at the Oak Ridge National Laboratory (ORNL) Leadership Computing Facility (LCF) has resulted in significant changes in the hardware and software stack, including a deeper memory hierarchy, SIMD instructions and a multi-core aware MPI library. In this paper, we evaluate impact of a subset of these key changes on large-scale scientific applications. We will provide insights into application tuning and optimization process and report on how different strategies yield varying rates of successes and failures across different application domains. For instance, we demonstrate that the vectorization instructions (SSE) provide a performance boost of as much as 50% on fusion and combustion applications. Moreover, we reveal how the resource contentions could limit the achievable performance and provide insights into how application could exploit Petascale XT5 system's hierarchical parallelism.
AB - An upgrade from dual-core to quad-core AMD processor on the Cray XT system at the Oak Ridge National Laboratory (ORNL) Leadership Computing Facility (LCF) has resulted in significant changes in the hardware and software stack, including a deeper memory hierarchy, SIMD instructions and a multi-core aware MPI library. In this paper, we evaluate impact of a subset of these key changes on large-scale scientific applications. We will provide insights into application tuning and optimization process and report on how different strategies yield varying rates of successes and failures across different application domains. For instance, we demonstrate that the vectorization instructions (SSE) provide a performance boost of as much as 50% on fusion and combustion applications. Moreover, we reveal how the resource contentions could limit the achievable performance and provide insights into how application could exploit Petascale XT5 system's hierarchical parallelism.
UR - http://www.scopus.com/inward/record.url?scp=70350686582&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-03869-3_33
DO - 10.1007/978-3-642-03869-3_33
M3 - Conference contribution
AN - SCOPUS:70350686582
SN - 3642038689
SN - 9783642038686
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 334
EP - 344
BT - Euro-Par 2009 Parallel Processing - 15th International Euro-Par Conference, Proceedings
T2 - Euro-Par 2009 Parallel Processing - 15th International Euro-Par Conference, Proceedings
Y2 - 25 August 2009 through 28 August 2009
ER -