TY - GEN
T1 - Memphis
T2 - 2010 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2010
AU - McCurdy, Collin
AU - Vetter, Jeffrey
PY - 2010
Y1 - 2010
N2 - Until recently, most high-end scientific applications have been immune to performance problems caused by Non-Uniform Memory Access (NUMA). However, current trends in micro-processor design are pushing NUMA to smaller and smaller scales. This paper examines the current state of NUMA and makes several contributions. First, we summarize the performance problems that NUMA can present for multi-threaded applications and describe methods of addressing them. Second, we demonstrate that NUMA can indeed be a significant problem for scientific applications, showing that it can mean the difference between an application scaling perfectly and failing to scale at all. Third, we describe, in increasing order of usefulness, three methods of using hardware performance counters to aid in finding NUMA-related problems. Finally, we introduce Memphis, a data-centric toolset that uses Instruction Based Sampling to help pinpoint problematic memory accesses, and demonstrate how we used it to improve the performance of several production-level codes - HYCOM, XGC1 and CAM - by 13%, 23% and 24% respectively.
AB - Until recently, most high-end scientific applications have been immune to performance problems caused by Non-Uniform Memory Access (NUMA). However, current trends in micro-processor design are pushing NUMA to smaller and smaller scales. This paper examines the current state of NUMA and makes several contributions. First, we summarize the performance problems that NUMA can present for multi-threaded applications and describe methods of addressing them. Second, we demonstrate that NUMA can indeed be a significant problem for scientific applications, showing that it can mean the difference between an application scaling perfectly and failing to scale at all. Third, we describe, in increasing order of usefulness, three methods of using hardware performance counters to aid in finding NUMA-related problems. Finally, we introduce Memphis, a data-centric toolset that uses Instruction Based Sampling to help pinpoint problematic memory accesses, and demonstrate how we used it to improve the performance of several production-level codes - HYCOM, XGC1 and CAM - by 13%, 23% and 24% respectively.
UR - http://www.scopus.com/inward/record.url?scp=77952562600&partnerID=8YFLogxK
U2 - 10.1109/ISPASS.2010.5452060
DO - 10.1109/ISPASS.2010.5452060
M3 - Conference contribution
AN - SCOPUS:77952562600
SN - 9781424460229
T3 - ISPASS 2010 - IEEE International Symposium on Performance Analysis of Systems and Software
SP - 87
EP - 96
BT - ISPASS 2010 - IEEE International Symposium on Performance Analysis of Systems and Software
Y2 - 28 March 2010 through 30 March 2010
ER -