TY - GEN
T1 - Scalability evaluation of barrier algorithms for OpenMP
AU - Nanjegowda, Ramachandra
AU - Hernandez, Oscar
AU - Chapman, Barbara
AU - Jin, Haoqiang H.
PY - 2009
Y1 - 2009
N2 - OpenMP relies heavily on barrier synchronization to coordinate the work of threads that are performing the computations in a parallel region. A good implementation of barriers is thus an important part of any implementation of this API. As the number of cores in shared and distributed shared memory machines continues to grow, the quality of the barrier implementation is critical for application scalability. There are a number of known algorithms for providing barriers in software. In this paper, we consider some of the most widely used approaches for implementing barriers on large-scale shared-memory multiprocessor systems: a "blocking" implementation that de-schedules a waiting thread, a "centralized" busy wait and three forms of distributed "busy" wait implementations are discussed. We have implemented the barrier algorithms in the runtime library associated with a research compiler, OpenUH. We first compare the impact of these algorithms on the overheads incurred for OpenMP constructs that involve a barrier, possibly implicitly. We then show how the different barrier implementations influence the performance of two different OpenMP application codes.
AB - OpenMP relies heavily on barrier synchronization to coordinate the work of threads that are performing the computations in a parallel region. A good implementation of barriers is thus an important part of any implementation of this API. As the number of cores in shared and distributed shared memory machines continues to grow, the quality of the barrier implementation is critical for application scalability. There are a number of known algorithms for providing barriers in software. In this paper, we consider some of the most widely used approaches for implementing barriers on large-scale shared-memory multiprocessor systems: a "blocking" implementation that de-schedules a waiting thread, a "centralized" busy wait and three forms of distributed "busy" wait implementations are discussed. We have implemented the barrier algorithms in the runtime library associated with a research compiler, OpenUH. We first compare the impact of these algorithms on the overheads incurred for OpenMP constructs that involve a barrier, possibly implicitly. We then show how the different barrier implementations influence the performance of two different OpenMP application codes.
UR - http://www.scopus.com/inward/record.url?scp=77951982886&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-02303-3_4
DO - 10.1007/978-3-642-02303-3_4
M3 - Conference contribution
AN - SCOPUS:77951982886
SN - 3642022847
SN - 9783642022845
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 42
EP - 52
BT - Evolving OpenMP in an Age of Extreme Parallelism - 5th International Workshop on OpenMP, IWOMP 2009, Proceedings
T2 - 5th International Workshop on OpenMP, IWOMP 2009
Y2 - 3 June 2009 through 5 June 2009
ER -