TY - GEN
T1 - Analytical modeling and optimization for affinity based thread scheduling on multicore systems
AU - Song, Fengguang
AU - Moore, Shirley
AU - Dongarra, Jack
PY - 2009
Y1 - 2009
N2 - This paper proposes an analytical model to estimate the cost of running an affinity-based thread schedule on multicore systems. The model consists of three submodels to evaluate the cost of executing a thread schedule: an affinity-graph submodel, a memory hierarchy submodel, and a cost submodel that characterize programs, machines, and costs respectively. We applied the analytical model to both synthetic and realworld applications. The estimated cost accurately predicts which schedule will provide better performance. Due to the NP-hardness of the scheduling problem, we designed an approximation algorithm to compute near-optimal solutions. We have extended the algorithm to support threads with data dependences. We conducted experiments with a computational fluid dynamics (CFD) kernel and Cholesky factorization on both UMA SMP and NUMA DSM machines. The results show that using the optimized thread schedule can improve the program performance by 25% to 400%, demonstrating that our method for determining an optimized thread schedule for multicore systems is efficient and practical.
AB - This paper proposes an analytical model to estimate the cost of running an affinity-based thread schedule on multicore systems. The model consists of three submodels to evaluate the cost of executing a thread schedule: an affinity-graph submodel, a memory hierarchy submodel, and a cost submodel that characterize programs, machines, and costs respectively. We applied the analytical model to both synthetic and realworld applications. The estimated cost accurately predicts which schedule will provide better performance. Due to the NP-hardness of the scheduling problem, we designed an approximation algorithm to compute near-optimal solutions. We have extended the algorithm to support threads with data dependences. We conducted experiments with a computational fluid dynamics (CFD) kernel and Cholesky factorization on both UMA SMP and NUMA DSM machines. The results show that using the optimized thread schedule can improve the program performance by 25% to 400%, demonstrating that our method for determining an optimized thread schedule for multicore systems is efficient and practical.
UR - http://www.scopus.com/inward/record.url?scp=72049130291&partnerID=8YFLogxK
U2 - 10.1109/CLUSTR.2009.5289173
DO - 10.1109/CLUSTR.2009.5289173
M3 - Conference contribution
AN - SCOPUS:72049130291
SN - 9781424450121
T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC
BT - 2009 IEEE International Conference on Cluster Computing and Workshops, CLUSTER '09
T2 - 2009 IEEE International Conference on Cluster Computing and Workshops, CLUSTER '09
Y2 - 31 August 2009 through 4 September 2009
ER -