TY - GEN
T1 - Implementing Arbitrary/Common Concurrent Writes of CRCW PRAM
AU - Ghanim, Fady
AU - Elwasif, Wael R.
AU - Bernholdt, David E.
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/8/9
Y1 - 2021/8/9
N2 - The Parallel Random Access Machines (PRAM) abstraction is the simplest and most elegant algorithmic model for the design and analysis of parallel algorithms. It consists of different models categorized based on the underlying memory access mode used, the most powerful of which is the Concurrent Read Concurrent Write (CRCW) model. A PRAM algorithm describes a series of rounds, each of which consists of a collection of operations that can be executed concurrently within the same time step. However, the lack of support for concurrent memory accesses and the prevalence of asynchronous programming models led to the belief that implementing CRCW PRAM algorithms is unattainable and prompted many to avoid this model except for theoretical studies of optimal performance. In this work, we study the arbitrary and common concurrent writes in the CRCW PRAM model and explore implementation challenges on general-purpose systems. Moreover, we examine current practices for implementing common/arbitrary concurrent writes and propose a new efficient lightweight and thread-safe method to implement concurrent writes through leveraging atomic instructions. To demonstrate the efficacy of our method, we developed OpenMP kernels for classical CRCW PRAM algorithms and provide experimental results and comparisons based on run time performance measured over the x86 multicore architecture. Our results show a performance speedup compared to current practices up to 4.5x across all our benchmarks.
AB - The Parallel Random Access Machines (PRAM) abstraction is the simplest and most elegant algorithmic model for the design and analysis of parallel algorithms. It consists of different models categorized based on the underlying memory access mode used, the most powerful of which is the Concurrent Read Concurrent Write (CRCW) model. A PRAM algorithm describes a series of rounds, each of which consists of a collection of operations that can be executed concurrently within the same time step. However, the lack of support for concurrent memory accesses and the prevalence of asynchronous programming models led to the belief that implementing CRCW PRAM algorithms is unattainable and prompted many to avoid this model except for theoretical studies of optimal performance. In this work, we study the arbitrary and common concurrent writes in the CRCW PRAM model and explore implementation challenges on general-purpose systems. Moreover, we examine current practices for implementing common/arbitrary concurrent writes and propose a new efficient lightweight and thread-safe method to implement concurrent writes through leveraging atomic instructions. To demonstrate the efficacy of our method, we developed OpenMP kernels for classical CRCW PRAM algorithms and provide experimental results and comparisons based on run time performance measured over the x86 multicore architecture. Our results show a performance speedup compared to current practices up to 4.5x across all our benchmarks.
KW - Arbitrary concurrent writes
KW - Crcw pram
KW - Parallel algorithms
KW - Parallel architectures
KW - Write-conflict resolution
UR - http://www.scopus.com/inward/record.url?scp=85115970493&partnerID=8YFLogxK
U2 - 10.1145/3458744.3474041
DO - 10.1145/3458744.3474041
M3 - Conference contribution
AN - SCOPUS:85115970493
T3 - ACM International Conference Proceeding Series
BT - 50th International Conference on Parallel Processing Workshop, ICPP 2021 - Proceedings
PB - Association for Computing Machinery
T2 - 50th International Conference on Parallel Processing Workshop, ICPP 2021
Y2 - 9 August 2021 through 12 August 2021
ER -